A list of NLP(Natural Language Processing) tutorials

Overview

NLP Tutorial

LICENSE GitHub issues GitHub stars GitHub forks

A list of NLP(Natural Language Processing) tutorials built on PyTorch.

Table of Contents

A step-by-step tutorial on how to implement and adapt to the simple real-word NLP task.

Text Classification

News Category Classification

This repo provides a simple PyTorch implementation of Text Classification, with simple annotation. Here we use Huffpost news corpus including corresponding category. The classification model trained on this dataset identify the category of news article based on their headlines and descriptions.
Keyword: CBoW, LSTM, fastText, Text cateogrization

IMDb Movie Review Classification

This text classification tutorial trains a transformer model on the IMDb movie review dataset for sentiment analysis. It provides a simple PyTorch implementation, with simple annotation.
Keyword: Transformer, Sentiment analysis

Question-Answer Matching

This repo provides a simple PyTorch implementation of Question-Answer matching. Here we use the corpus from Stack Exchange to build embeddings for entire questions. Using those embeddings, we find similar questions for a given question, and show the corresponding answers to those I found.
Keyword: CBoW, TF-IDF, LSTM with variable-length seqeucnes

Movie Review Classification (Korean NLP)

This repo provides a simple Keras implementation of TextCNN for Text Classification. Here we use the movie review corpus written in Korean. The model trained on this dataset identify the sentiment based on review text.
Keyword: TextCNN, Sentiment analysis


Neural Machine Translation

English to French Translation - seq2seq

This neural machine translation tutorial trains a seq2seq model on a set of many thousands of English to French translation pairs to translate from English to French. It provides an intrinsic/extrinsic comparison of various sequence-to-sequence (seq2seq) models in translation.
Keyword: sequence to seqeunce network(seq2seq), Attention, Autoregressive, Teacher-forcing

French to English Translation - Transformer

This neural machine translation tutorial trains a Transformer model on a set of many thousands of French to English translation pairs to translate from French to English. It provides a simple PyTorch implementation, with simple annotation.
Keyword: Transformer, SentencePiece


Natural Language Understanding

Neural Language Model

This repo provides a simple PyTorch implementation of Neural Language Model for natural language understanding. Here we implement unidirectional/bidirectional language models, and pre-train language representations from unlabeled text (Wikipedia corpus).
Keyword: Autoregressive language model, Perplexity

Comments
  • Arabic to Urdu Machine Translation

    Arabic to Urdu Machine Translation

    @lyeoni

    In the case I want to train an Arabic to Urdu Machine Translation:

    • is that attainable using this project?
    • what options should be set in training?
    • do you suggest another github project?
    opened by ghost 1
  • Little improvements for right indexes in vocabulary dictionaries

    Little improvements for right indexes in vocabulary dictionaries

    Hi, @lyeoni ! You have written great tutorials. I really appreciate you) We can improve a little bit with one pretty line. Look, please) Here, we fill first key-value items of stoi, itos by special tokens. I suggest insert this line before cycle. special_tokens = filter(lambda x: x is not None, [self.unk_token, self.bos_token, self.eos_token, self.pad_token]) If we don't set value for self.unk_token and set for self.bos_token, then index in dictionary become wrong. So, we need filter None values before. Input vocab = Vocab(body, bos_token='<bos>'); vocab.build(); vocab.stoi; Wrong Output '<bos>': 1 ' ': 1, 'hi': 2, 'bear': 3, ...

    opened by datason 1
  • local variable 'MosesTokenizer' referenced before assignment

    local variable 'MosesTokenizer' referenced before assignment

    The corresponding package is installed and Data set downloaded,Run vocab.py . The following error occurred: “local variable 'MosesTokenizer' referenced before assignment”

    opened by pkly110 1
  • Question about validate acc

    Question about validate acc

    Thanks for your great job! I learned a lot. However, I have a question. I train the model for 7 epochs reaching a train acc of 95.2 and test(validate) acc of 85.2. Is that normal? Could the final test(validate) acc be higher after more epochs? Thanks!

    opened by ALUKErnel 1
  • typo in preprocessing?

    typo in preprocessing?

    Hi, In cleaning function in the script : nlp-tutorial/news-category-classifcation/preprocessing.py, line 21 is written as text = re.sub(r'[!]{2,}', '?', text) # multiple ?s -> ?. There should be ? in first argument and It should be text = re.sub(r'[?]{2,}', '?', text) # multiple ?s -> ?. Am I correct?

    opened by VirkSaab 0
  • question-answer-matching missing file

    question-answer-matching missing file

    Hi Lyeoni,

    First of all, thank you a lot for your work in making these tutorials, which are interesting !

    I am trying to run the question-answer-matching tutorial and reproduce your evaluation. Unfortunately, I can't download the Posts.xml file from git lfs as it looks like your subscription doesn't accept download anymore. By any chance, do you have that file hosted somewhere else ? That would allow me to run the evaluation with your trained model.

    Thanks a lot and I wish you a nice day ! :-)

    opened by JeremyWau 0
  • neural-machine-translation - nmt  ZeroDivisionError: integer division or modulo by zero

    neural-machine-translation - nmt ZeroDivisionError: integer division or modulo by zero

    Traceback (most recent call last):

    File "", line 1, in runfile('D:/nlp-tutorial/neural-machine-translation/nmt/train.py', wdir='D:/nlp-tutorial/neural-machine-translation/nmt')

    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

    File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 254, in trainiters(pairs, encoder, decoder, n_iters)

    File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 184, in trainiters train_pairs += [random.choice(train_pairs) for i in range(n_iters%len(train_pairs))]

    ZeroDivisionError: integer division or modulo by zero

    opened by abdoelsayed2016 1
Owner
Allen Lee
AI Research Engineer, NLP-holic
Allen Lee
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 2, 2023
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 1, 2023
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 19.5k Feb 13, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 1.9k Feb 3, 2021
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 19.6k Feb 18, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 10k Feb 18, 2021
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 1.9k Feb 18, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

flair 12.3k Jan 2, 2023
Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

Tianxiang Sun 41 Dec 30, 2022
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022
Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

Çağrı Karadeniz 7 Mar 12, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Felipe Maia Polo 125 Dec 20, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 1, 2022
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Dec 30, 2022