LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

Overview

LazyText

lazy

lazytext Documentation Code Coverage Downloads

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic mpdels without much code. LazyText is for text what lazypredict is for numeric data.

  • Free Software: MIT licence

Installation

To install LazyText

pip install lazytext

Usage

To use lazytext import in your project as

from lazytext.supervised import LazyTextPredict

Text Classification

Text classification on BBC News article classification.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from lazytext.supervised import LazyTextPredict
import re
import nltk

# Load the dataset
df = pd.read_csv("tests/assets/bbc-text.csv")
df.dropna(inplace=True)

# Download models required for text cleaning
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

# split the data into train set and test set
df_train, df_test = train_test_split(df, test_size=0.3, random_state=13)

# Tokenize the words
df_train['clean_text'] = df_train['text'].apply(nltk.word_tokenize)
df_test['clean_text'] = df_test['text'].apply(nltk.word_tokenize)

# Remove stop words
stop_words=set(nltk.corpus.stopwords.words("english"))
df_train['text_clean'] = df_train['clean_text'].apply(lambda x: [item for item in x if item not in stop_words])
df_test['text_clean'] = df_test['clean_text'].apply(lambda x: [item for item in x if item not in stop_words])

# Remove numbers, punctuation and special characters (only keep words)
regex = '[a-z]+'
df_train['text_clean'] = df_train['text_clean'].apply(lambda x: [item for item in x if re.match(regex, item)])
df_test['text_clean'] = df_test['text_clean'].apply(lambda x: [item for item in x if re.match(regex, item)])

# Lemmatization
lem = nltk.stem.wordnet.WordNetLemmatizer()
df_train['text_clean'] = df_train['text_clean'].apply(lambda x: [lem.lemmatize(item, pos='v') for item in x])
df_test['text_clean'] = df_test['text_clean'].apply(lambda x: [lem.lemmatize(item, pos='v') for item in x])

# Join the words again to form sentences
df_train["clean_text"] = df_train.text_clean.apply(lambda x: " ".join(x))
df_test["clean_text"] = df_test.text_clean.apply(lambda x: " ".join(x))

# Tfidf vectorization
vectorizer = TfidfVectorizer()

x_train = vectorizer.fit_transform(df_train.clean_text)
x_test = vectorizer.transform(df_test.clean_text)
y_train = df_train.category.tolist()
y_test = df_test.category.tolist()

lazy_text = LazyTextPredict(
    classification_type="multiclass",
    )
models = lazy_text.fit(x_train, x_test, y_train, y_test)


Label Analysis
| Classes             | Weights              |
|--------------------:|---------------------:|
| tech                | 0.8725490196078431   |
| politics            | 1.1528497409326426   |
| sport               | 1.0671462829736211   |
| entertainment       | 0.8708414872798435   |
| business            | 1.1097256857855362   |

 Result Analysis
| Model                         | Accuracy            | Balanced Accuracy   | F1 Score            | Custom Metric Score | Time Taken          |
| ----------------------------: | -------------------:| -------------------:| -------------------:| -------------------:| -------------------:|
| AdaBoostClassifier            | 0.7260479041916168  | 0.717737172132769   | 0.7248335989941609  | NA                  | 1.829047679901123   |
| BaggingClassifier             | 0.8817365269461078  | 0.8796633962363677  | 0.8814695332332374  | NA                  | 3.5215072631835938  |
| BernoulliNB                   | 0.9535928143712575  | 0.9505929193425733  | 0.9533647387436917  | NA                  | 0.020041465759277344|
| CalibratedClassifierCV        | 0.9760479041916168  | 0.9760018220340847  | 0.9755904096436046  | NA                  | 0.4990670680999756  |
| ComplementNB                  | 0.9760479041916168  | 0.9752329192546583  | 0.9754237510855159  | NA                  | 0.013598203659057617|
| DecisionTreeClassifier        | 0.8532934131736527  | 0.8473956671194278  | 0.8496464898940103  | NA                  | 0.478792667388916   |
| DummyClassifier               | 0.2155688622754491  | 0.2                 | 0.07093596059113301 | NA                  | 0.008046865463256836|
| ExtraTreeClassifier           | 0.7275449101796407  | 0.7253518459908658  | 0.7255575847020816  | NA                  | 0.026398658752441406|
| ExtraTreesClassifier          | 0.9655688622754491  | 0.9635363285903302  | 0.9649837485086689  | NA                  | 1.6907336711883545  |
| GradientBoostingClassifier    | 0.9565868263473054  | 0.9543725191544354  | 0.9554606292723953  | NA                  | 39.16400766372681   |
| KNeighborsClassifier          | 0.938622754491018   | 0.9370053693959814  | 0.9367294513157219  | NA                  | 0.14803171157836914 |
| LinearSVC                     | 0.9745508982035929  | 0.974262691599302   | 0.9740343976103922  | NA                  | 0.10053229331970215 |
| LogisticRegression            | 0.968562874251497   | 0.9668995859213251  | 0.9678778814908909  | NA                  | 2.9565982818603516  |
| LogisticRegressionCV          | 0.9715568862275449  | 0.9708896757262861  | 0.971147482393915   | NA                  | 109.64091444015503  |
| MLPClassifier                 | 0.9760479041916168  | 0.9753381642512078  | 0.9752912960666735  | NA                  | 35.64296746253967   |
| MultinomialNB                 | 0.9700598802395209  | 0.9678795721187026  | 0.9689200656860745  | NA                  | 0.024427413940429688|
| NearestCentroid               | 0.9520958083832335  | 0.9499045135454718  | 0.9515097876015481  | NA                  | 0.024636268615722656|
| NuSVC                         | 0.9670658682634731  | 0.9656159420289855  | 0.9669719954040374  | NA                  | 8.287142515182495   |
| PassiveAggressiveClassifier   | 0.9775449101796407  | 0.9772388820754925  | 0.9770812340935414  | NA                  | 0.10332632064819336 |
| Perceptron                    | 0.9775449101796407  | 0.9769254658385094  | 0.9768161404324825  | NA                  | 0.07216000556945801 |
| RandomForestClassifier        | 0.9625748502994012  | 0.9605135542632081  | 0.9624462948504477  | NA                  | 1.2427525520324707  |
| RidgeClassifier               | 0.9775449101796407  | 0.9769254658385093  | 0.9769176825464448  | NA                  | 0.17272400856018066 |
| SGDClassifier                 | 0.9700598802395209  | 0.9695007868373973  | 0.969787370271274   | NA                  | 0.13134551048278809 |
| SVC                           | 0.9715568862275449  | 0.9703778467908902  | 0.9713021262026043  | NA                  | 8.388679027557373   |

Result of each estimator is stored in models which is a list and each trained estimator is also returned which can be used further for analysis.

confusion matrix and classification reports are also part of the models if they are needed.

print(models[0])
{
    'name': 'AdaBoostClassifier',
    'accuracy': 0.7260479041916168,
    'balanced_accuracy': 0.717737172132769,
    'f1_score': 0.7248335989941609,
    'custom_metric_score': 'NA',
    'time': 1.829047679901123,
    'model': AdaBoostClassifier(),
    'confusion_matrix': array([
        [ 89,   5,  12,  35,   3],
        [  8,  58,   5,  44,   0],
        [  5,   2, 108,  10,   1],
        [  5,   7,   5, 138,   2],
        [ 25,   5,   1,   3,  92]]),
 'classification_report':
 """
            precision    recall  f1-score   support
        0       0.67      0.62      0.64       144
        1       0.75      0.50      0.60       115
        2       0.82      0.86      0.84       126
        3       0.60      0.88      0.71       157
        4       0.94      0.73      0.82       126
 accuracy                           0.73       668
 macro avg       0.76      0.72     0.72       668
 weighted avg    0.75      0.73     0.72       668'}

Custom metrics

LazyText also support custom metric for evaluation, this metric can be set up like following

from lazytext.supervised import LazyTextPredict
# Custom metric
def my_custom_metric(y_true, y_pred):

    ...do your stuff

    return score


lazy_text = LazyTextPredict(custom_metric=my_custom_metric)
lazy_text.fit(X_train, X_test, y_train, y_test)

If the signature of the custom metric function does not match with what is given above, then even though the custom metric is provided, it will be ignored.

Custom model parameters

LazyText also support providing parameters to the esitmators. For this just provide a dictornary of the parameters as shown below and those following arguments will be applied to the desired estimator.

In the following example I want to apply/change the default parameters of SVC classifier.

LazyText will fit all the models but only change the default parameters for SVC in the following case.

from lazytext.supervisd
custom_parameters = [
    {
        "name": "SVC",
        "parameters": {
            "C": 0.5,
            "kernel": 'poly',
            "degree": 5
        }
    }
]


l = LazyTextPredict(
    classification_type="multiclass",
    custom_parameters=custom_parameters
    )
l.fit(x_train, x_test, y_train, y_test)
You might also like...
Repository containing the code for An-Gocair text normaliser

Scottish Gaelic Text Normaliser The following project contains the code and resources for the Scottish Gaelic text normalisation project. The repo can

Code Jam for creating a text-based adventure game engine and custom worlds

Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format

Microsoft's Cascadia Code font customized to my liking.

Microsoft's Cascadia Code font customized to my liking. Also includes some simple batch patch and bake scripts to batch patch glyphs and bake font features into fonts!

Hamming code generation, error detection & correction.

Hamming code generation, error detection & correction.

Simple python program to auto credit your code, text, book, whatever!

Credit Simple python program to auto credit your code, text, book, whatever! Setup First change credit_text to whatever text you would like to credit

A minimal code sceleton for a textadveture parser written in python.

Textadventure sceleton written in python Use with a map file generated on https://www.trizbort.io Use the following Sockets for walking directions: n

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

The sequel to SquidNet. It has many of the previous features that were in the original script, however a lot of the functions that do not serve much functionality have been removed.

SquidNet2 The sequel to SquidNet. It has many of the previous features that were in the original script, however a lot of the functions that do not se

A python script providing an idea of how a MindSphere application, e.g., a dashboard, can be displayed around the clock without the need of manual re-authentication on enforced session expiration

A python script providing an idea of how a MindSphere application, e.g., a dashboard, can be displayed around the clock without the need of manual re-authentication on enforced session expiration

A concept I came up which ditches the idea of
A concept I came up which ditches the idea of "layers" in a neural network.

Dynet A concept I came up which ditches the idea of "layers" in a neural network. Install Copy Dynet.py to your project. Run the example Install matpl

Ubuntu env build; Nginx build; DB build;

Deploy 介绍 Deploy related scripts bitnami Dependencies Ubuntu openssl envsubst docker v18.06.3 docker-compose init base env upload https://gitlab-runn

Aggrokatz is an aggressor plugin extension for Cobalt Strike which enables pypykatz to interface with the beacons remotely and allows it to parse LSASS dump files and registry hive files to extract credentials and other secrets stored without downloading the file and without uploading any suspicious code to the beacon. A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.
A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.

Baby Buddy A buddy for babies! Helps caregivers track sleep, feedings, diaper changes, tummy time and more to learn about and predict baby's needs wit

Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use
Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use

Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use, Lazymux install any of the given tools provided by it from itself with just one click, and its often get updated.

When doing audio and video sentiment recognition, I found that a lot of code is duplicated, often a function in different time debugging for a long time, based on this problem, I want to manage all the previous work, organized into an open source library can be iterative. For their own use and others. PathPicker accepts a wide range of input -- output from git commands, grep results, searches -- pretty much anything.After parsing the input, PathPicker presents you with a nice UI to select which files you're interested in. After that you can open them in your favorite editor or execute arbitrary commands.
A simple script which allows you to see how much GEXP you earned for playing in the last Minecraft Hypixel server session

Project Landscape A simple script which allows you to see how much GEXP you earned for playing in the Minecraft Server Hypixel Usage Install python 3.

Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more.

Ross-Virtual-Assistant Ross Virtual Assistant is a programme which can play Music, search Wikipedia, open Websites and much more. Installation Downloa

Releases(0.0.2)
Owner
Jay Vala
Data Scientist at scoutbee
Jay Vala
Build a translation program similar to Google Translate with Python programming language and QT library

google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra

Amir Hussein Sharifnezhad 3 Oct 9, 2021
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
This repos is auto action which generating a wordcloud made by Twitter.

auto_tweet_wordcloud This repos is auto action which generating a wordcloud made by Twitter. Preconditions Install Python dependencies pip install -r

tubone(Yu Otsubo) 0 Apr 29, 2022
Convert text to morse code and play morse code sound.

Convert text(english) to morse codes and play morse sound!

Mohammad Dori 5 Jul 15, 2022
A Python package to facilitate research on building and evaluating automated scoring models.

Rater Scoring Modeling Tool Introduction Automated scoring of written and spoken test responses is a growing field in educational natural language pro

ETS 59 Oct 10, 2022
A generator library for concise, unambiguous and URL-safe UUIDs.

Description shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs. Often, one needs to use non-sequential IDs in pl

Stavros Korokithakis 1.8k Dec 31, 2022
Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

Pyparsing 1.7k Dec 27, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022
py-trans is a Free Python library for translate text into different languages.

Free Python library to translate text into different languages.

I'm Not A Bot #Left_TG 13 Aug 27, 2022