nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Tae-Hwan Jung

Last update: Jan 8, 2023

Related tags

Text Data & NLP nlp natural-language-processing tutorial tensorflow paper pytorch transformer attention bert

Overview

nlp-tutorial

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank lines)

[08-14-2020] Old TensorFlow v1 code is archived in the archive folder. For beginner readability, only pytorch version 1.0 or higher is supported.

Curriculum - (Example Purpose)

1. Basic Embedding Model

1-1. NNLM(Neural Network Language Model) - Predict Next Word
- Paper - A Neural Probabilistic Language Model(2003)
- Colab - NNLM.ipynb
1-2. Word2Vec(Skip-gram) - Embedding Words and Show Graph
- Paper - Distributed Representations of Words and Phrases and their Compositionality(2013)
- Colab - Word2Vec.ipynb
1-3. FastText(Application Level) - Sentence Classification
- Paper - Bag of Tricks for Efficient Text Classification(2016)
- Colab - FastText.ipynb

2. CNN(Convolutional Neural Network)

2-1. TextCNN - Binary Sentiment Classification
- Paper - Convolutional Neural Networks for Sentence Classification(2014)
- TextCNN.ipynb

3. RNN(Recurrent Neural Network)

3-1. TextRNN - Predict Next Step
- Paper - Finding Structure in Time(1990)
- Colab - TextRNN.ipynb
3-2. TextLSTM - Autocomplete
- Paper - LONG SHORT-TERM MEMORY(1997)
- Colab - TextLSTM.ipynb
3-3. Bi-LSTM - Predict Next Word in Long Sentence
- Colab - Bi_LSTM.ipynb

4. Attention Mechanism

4-1. Seq2Seq - Change Word
- Paper - Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation(2014)
- Colab - Seq2Seq.ipynb
4-2. Seq2Seq with Attention - Translate
- Paper - Neural Machine Translation by Jointly Learning to Align and Translate(2014)
- Colab - Seq2Seq(Attention).ipynb
4-3. Bi-LSTM with Attention - Binary Sentiment Classification
- Colab - Bi_LSTM(Attention).ipynb

5. Model based on Transformer

5-1. The Transformer - Translate
- Paper - Attention Is All You Need(2017)
- Colab - Transformer.ipynb, Transformer(Greedy_decoder).ipynb
5-2. BERT - Classification Next Sentence & Predict Masked Tokens
- Paper - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(2018)
- Colab - BERT.ipynb

Dependencies

Python 3.5+
Pytorch 1.0.0+

Author

Tae Hwan Jung(Jeff Jung) @graykode
Author Email : [email protected]
Acknowledgements to mojitok as NLP Research Internship.

Comments

A questions about seq2seq-torch.py in the 43th line

Hi, Im an nlp rookie, I want to ask u a question, your code extract input(context) in a fixed window in 43th area, and "word sequence" is a sentences list , some words may extract their neighbour words form different sentences, so, is this way harm to the result?

And my training result seems not very well and I didn't change the codes.

If u see this issues, please answer me in your free time. Although my english is poor, I still want to express my gratitude to u.
question

opened by MowanHu 4
Added greedy decoder input

In this PR, I added a greedy decoder function that generates the decoder input for inference. This is important for translating sentences as we don't know the target input beforehand. In the paper, they mentioned that they ran Beam Search with a k=4. In the greedy approach, k = 1.

opened by dmmiller612 4
remove autograd.Variable and fixed symbol P
The variable API has been deprecated. you can remove this. I just provided an example with basic attention model. if you agree with me, retune it. https://pytorch.org/docs/stable/autograd.html#variable-deprecated

sequence to sequence model has n_step variable(i guess you mean padding... right? but I just little confused that you used identical terminology even if this is tutorial.
opened by SpellOnYou 2
Seq2Seq pytorch

Hi thanks for sharing your codes.

I've had read your seq2seq implementation and I was wondering about the RNN Encode-Decode model.

in the paper, 'Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation'

They say

proposed gating unit

and I couldn't find the new hidden-state activation function in your code.

Do you have any plan to add the proposed activation process? or is it okay to just skip the parts?

thank you so much in advance
question

opened by DonghyungKo 2
Updated the transformer greedy method

In this PR, I added a greedy decoder function that generates the decoder input for inference. This is important for translating sentences as we don't know the target input beforehand. In the paper, they mentioned that they ran Beam Search with a k=4. In the greedy approach, k = 1.

opened by dmmiller612 2

Fix comment errors in NNLM

Small mistakes in NNLM:

X = self.C(X) # X : [batch_size, n_step, n_class]
X = X.view(-1, n_step * m) # [batch_size, n_step * n_class]

should be

X = self.C(X) # X : [batch_size, n_step, m]
X = X.view(-1, n_step * m) # [batch_size, n_step * m]

#48

opened by secsilm 1

Add "Launch in Deepnote" button

Hi there, I'm Dan from Deepnote. We're a small startup working on a new, jupyter-compatible data science notebook. I was trying out Deepnote with some cool public repos like this one and I thought it might be useful for others too to have a one-click button to run your repo, so I submitted a PR. Hope you find it useful!

Thanks for your work, Dan

opened by danzvara 1

doubts about the TextCNN Code

`class TextCNN(nn.Module): def init(self): super(TextCNN, self).init()

    self.num_filters_total = num_filters * len(filter_sizes)
    self.W = nn.Parameter(torch.empty(vocab_size, embedding_size).uniform_(-1, 1)).type(dtype)
    self.Weight = nn.Parameter(torch.empty(self.num_filters_total, num_classes).uniform_(-1, 1)).type(dtype)
    self.Bias = nn.Parameter(0.1 * torch.ones([num_classes])).type(dtype)

def forward(self, X):
    embedded_chars = self.W[X] # [batch_size, sequence_length, sequence_length]
    embedded_chars = embedded_chars.unsqueeze(1) # add channel(=1) [batch, channel(=1), sequence_length, embedding_size]

    pooled_outputs = []
    for filter_size in filter_sizes:
        # conv : [input_channel(=1), output_channel(=3), (filter_height, filter_width), bias_option]
        conv = nn.Conv2d(1, num_filters, (filter_size, embedding_size), bias=True)(embedded_chars)
        h = F.relu(conv)
        # mp : ((filter_height, filter_width))
        mp = nn.MaxPool2d((sequence_length - filter_size + 1, 1))
        # pooled : [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3)]
        pooled = mp(h).permute(0, 3, 2, 1)
        pooled_outputs.append(pooled)

    h_pool = torch.cat(pooled_outputs, len(filter_sizes)) # [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3) * 3]
    h_pool_flat = torch.reshape(h_pool, [-1, self.num_filters_total]) # [batch_size(=6), output_height * output_width * (output_channel * 3)]

    model = torch.mm(h_pool_flat, self.Weight) + self.Bias # [batch_size, num_classes]
    return model`

I wonder if it's wrong to create conv inside the loop?

opened by ZihaoZheng98 1

Why is first parameter `src_vocab_size`?

https://github.com/graykode/nlp-tutorial/blob/3b3a80dc63e69935731bcf09c951eb371692af8f/5-1.Transformer/Transformer(Greedy_decoder)-Torch.py#L141

position encoding table should be (src_len, d_model). Why (src_vocab_size, d_model) here?

opened by yangdechuan 1
A question about Autocomplete LSTM Tensorflow
In Autocomplete We already have

X = tf.placeholder(tf.float32, [None, n_step, n_class]) # [batch_size, n_step, n_class] Y = tf.placeholder(tf.float32, [None, n_class])

to guess next missing character

How I can customize them to guess more than 1 character ? I don't have any idea about multiplies a tensor by tensor.

In outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32) Why the shape of states alway (2,), what really 2 mean ? Thank you for sharing the information.

question
opened by nvlong198 1
different Embedding way

In the code 'Seq2seq-torch.py', i saw u use np.eye,the one-hot representation, to represent embedding, so i change in a normal way ,using nn.Embedding(dict_length,embedding_dim),it can work out. but the loss i got is very high. i wanna ask the differences between this two ways. here are my code and the result.

help wanted

opened by lhbrichard 1

The Learning Rate in 5-2.BERT must be reduced.

In Line 209:

optimizer = optim.Adam(model.parameters(), lr=0.001)

In practice, this BERT model is bound to fall into local convergence if the learning rate is 0.001; I think the learning rate should be reduced to 0.0001. The experimental results show that when the learning rate is 0.0001, after about 100 iterations, the loss value will be reduced to 0.1, while if the learning rate is 0.001, the loss value will almost never be less than 2.0.

when lr=0.01

Epoch: 0010 cost = 15.205759
Epoch: 0020 cost = 16.236261
Epoch: 0030 cost = 18.436878
Epoch: 0040 cost = 4.077913
Epoch: 0050 cost = 12.703120
Epoch: 0060 cost = 10.411244
Epoch: 0070 cost = 1.640913
Epoch: 0080 cost = 10.753708
Epoch: 0090 cost = 8.370532
Epoch: 0100 cost = 1.624577
Epoch: 0110 cost = 8.537676
Epoch: 0120 cost = 7.453298
Epoch: 0130 cost = 1.659591
Epoch: 0140 cost = 7.092763
Epoch: 0150 cost = 6.843360
Epoch: 0160 cost = 1.688111
Epoch: 0170 cost = 6.052425
Epoch: 0180 cost = 6.395712
Epoch: 0190 cost = 1.707749
Epoch: 0200 cost = 5.263054
······
Epoch: 5000 cost = 2.523541

when lr=0.0001

Epoch: 0010 cost = 13.998453
Epoch: 0020 cost = 6.168099
Epoch: 0030 cost = 3.504844
Epoch: 0040 cost = 2.312538
Epoch: 0050 cost = 1.723783
Epoch: 0060 cost = 1.412463
Epoch: 0070 cost = 0.930549
Epoch: 0080 cost = 0.671946
Epoch: 0090 cost = 0.745429
Epoch: 0100 cost = 0.139699
Epoch: 0110 cost = 0.187208
Epoch: 0120 cost = 0.075726

opened by Cheng0829 0

The Adam in 5-1.Transformer should be replaced by SGD
Line 202 : optimizer = optim.Adam(model.parameters(), lr=0.001)

In practice, I think the effect of Adam is quite bad. When epoch = 10, cost is 1.6; when epoch = 100 or 1000, cost is still equal to 1.6. So I think we can change Adam to SGD, that is, optimizer = optim.SGD(model.parameters(), lr=0.001)

Here are the effects of using SGD：

Epoch: 0100 cost = 0.047965 Epoch: 0200 cost = 0.020129 Epoch: 0300 cost = 0.012563 Epoch: 0400 cost = 0.009101 Epoch: 0500 cost = 0.007131 Epoch: 0600 cost = 0.005862 Epoch: 0700 cost = 0.004978 Epoch: 0800 cost = 0.004325 Epoch: 0900 cost = 0.003823 Epoch: 1000 cost = 0.003426
opened by Cheng0829 0

Faster attention calculation in 4-2.Seq2Seq?

Thanks for sharing! Just found out Attention.get_att_weight is calculating attention in a for-loop? this looks rather slow isn't it?

4-2.Seq2Seq(Attention)/Seq2Seq(Attention).ipynb

    def get_att_weight(self, dec_output, enc_outputs):  # get attention weight one 'dec_output' with 'enc_outputs'
        n_step = len(enc_outputs)
        attn_scores = torch.zeros(n_step)  # attn_scores : [n_step]

        for i in range(n_step):
            attn_scores[i] = self.get_att_score(dec_output, enc_outputs[i])

        # Normalize scores to weights in range 0 to 1
        return F.softmax(attn_scores).view(1, 1, -1)

    def get_att_score(self, dec_output, enc_output):  # enc_outputs [batch_size, num_directions(=1) * n_hidden]
        score = self.attn(enc_output)  # score : [batch_size, n_hidden]
        return torch.dot(dec_output.view(-1), score.view(-1))  # inner product make scalar value

Suggested parallel version

    def get_att_weight(self, dec_output, enc_outputs):  # get attention weight one 'dec_output' with 'enc_outputs'
        n_step = len(enc_outputs)
        attn_scores = torch.zeros(n_step,device=self.device)  # attn_scores : [n_step]

        enc_t = self.attn(enc_outputs)
        score = dec_output.transpose(1,0).bmm(enc_t.transpose(1,0).transpose(2,1))
        out1   = score.softmax(-1)
        return out1

opened by shouldsee 0

BiLstm(tf) maybe have mistake

calculate attention_score `

Attention

outputs = tf.concat([output[0], output[1]], 2) # output[0] : lstm_fw, output[1] : lstm_bw outputs = tf.transpose(outputs, [1, 0, 2]) # [n_step, batch_size, n_hidden]

只用了最后一个步长的输出

final_hidden_state = outputs[-1] output_all = tf.concat([output[0], output[1]], 2)
final_hidden_state = tf.expand_dims(final_hidden_state, 2)
attn_weights = tf.squeeze(tf.matmul(output_all, final_hidden_state), 2) `

opened by cui-z 0
5.1 Transformer may have wrong position embed
in"class Encoder": enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[1,2,3,4,0]]))

I think it may be: enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[0,1,2,3,4]]))

in"class Decoder": dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[5,1,2,3,4]]))

I think it may be: dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[0,1,2,3,4]]))
opened by JiangHan97 0

3-3.Bi-LSTM may have wrong padding

In line 16 you use input = input + [0] * (max_len - len(input)) the padding, you use 0, which means the first word 'Lorem'. but it is not the right choose. I think you can change like that

    # word_dict = {w: i for i, w in enumerate(list(set(sentence.split())))}
    # number_dict = {i: w for i, w in enumerate(list(set(sentence.split())))}
    word_dict = {w: i for i, w in enumerate(['PAD']+list(set(sentence.split())))}
    number_dict = {i: w for i, w in enumerate(['PAD']+list(set(sentence.split())))}

opened by ETWBC 0

Owner

Tae-Hwan Jung

amor fati

GitHub https://www.reddit.com/r/MachineLearning/comments/amfinl/project_nlptutoral_repository_who_is_studying/

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

1.9k Feb 3, 2021

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

1.9k Feb 18, 2021

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

24.9k Jan 2, 2023

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

19.5k Feb 13, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

12.3k Dec 31, 2022

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

19.6k Feb 18, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

10k Feb 18, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

12.3k Jan 2, 2023

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

41 Dec 30, 2022

Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve