Basic yet complete Machine Learning pipeline for NLP tasks

Ivan

Last update: Aug 22, 2022

Related tags

Text Data & NLP ml-pipeline

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

This repository accompanies the article on building basic yet complete ML pipelines for solving NLP tasks.

Requirements

Docker

telnet

Please refer to installation instructions for your system if needed.

Running the pipeline

The whole pipeline of 4 services (mail server, database, prediction service and orchestrator) can be started with one command:

docker-compose -f docker-compose.yaml up --build

It should start printing log messages from the services.

Sending an email

The pipeline is triggered by an unread email appearing in the mailbox. In order to send one, telnet util can be used.

Connecting to the IMAP mail server: telnet localhost 3025

Sending the email with telnet:

EHLO user
MAIL FROM:<[email protected]>
RCPT TO:<user>
DATA
Subject: Hello World
 
Hello!

She works at Apple now but before that she worked at Microsoft.
.
QUIT

If everything went well, something like this should appear in logs:

orchestrator_1                   | Polling mailbox...
prediction-worker_1              | INFO:     172.19.0.5:55294 - "POST /predict HTTP/1.1" 200 OK
orchestrator_1                   | Recorded to DB with id=34: [{'entity_text': 'Apple', 'start': 24, 'end': 29}, {'entity_text': 'Microsoft', 'start': 58, 'end': 67}]

Checking the result

The data must also be recorded to the database. In order to check that, any DB client can be used with the following connection parameters:

host: localhost
port: 5432
database: maildb
username: pguser
pasword: password

and running SELECT * FROM mail LIMIT 10 query.

Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 5, 2022

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

16 Nov 12, 2022

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

137 Oct 26, 2022

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

2 Sep 27, 2022

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 6, 2022

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Colibri Core by Maarten van Gompel, [email protected], Radboud University Nijmegen Licensed under GPLv3 (See http://www.gnu.org/licenses/gpl-3.0.html

122 Nov 17, 2022

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

2.1k Jan 1, 2023

1.9k Feb 3, 2021

1.9k Feb 18, 2021

Basic yet complete Machine Learning pipeline for NLP tasks

Related tags

Overview

Basic yet complete Machine Learning pipeline for NLP tasks

Requirements

Running the pipeline

Running the pipeline

Sending an email

Checking the result

You might also like...

Continuously update some NLP practice based on different tasks.

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP)

Owner

Ivan

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

A complete NLP guideline for enthusiasts

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

DaCy: The State of the Art Danish NLP pipeline using SpaCy

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

skweak: A software toolkit for weak supervision applied to NLP tasks

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

This is a project of data parallel that running on NLP tasks.