nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Overview

nlp-tutorial

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank lines)

  • [08-14-2020] Old TensorFlow v1 code is archived in the archive folder. For beginner readability, only pytorch version 1.0 or higher is supported.

Curriculum - (Example Purpose)

1. Basic Embedding Model

2. CNN(Convolutional Neural Network)

3. RNN(Recurrent Neural Network)

4. Attention Mechanism

5. Model based on Transformer

Dependencies

  • Python 3.5+
  • Pytorch 1.0.0+

Author

  • Tae Hwan Jung(Jeff Jung) @graykode
  • Author Email : [email protected]
  • Acknowledgements to mojitok as NLP Research Internship.
Comments
  • A questions about  seq2seq-torch.py in the 43th line

    A questions about seq2seq-torch.py in the 43th line

    Hi, Im an nlp rookie, I want to ask u a question, your code extract input(context) in a fixed window in 43th area, and "word sequence" is a sentences list , some words may extract their neighbour words form different sentences, so, is this way harm to the result?

    And my training result seems not very well and I didn't change the codes. image

    If u see this issues, please answer me in your free time. Although my english is poor, I still want to express my gratitude to u.

    question 
    opened by MowanHu 4
  • Added greedy decoder input

    Added greedy decoder input

    In this PR, I added a greedy decoder function that generates the decoder input for inference. This is important for translating sentences as we don't know the target input beforehand. In the paper, they mentioned that they ran Beam Search with a k=4. In the greedy approach, k = 1.

    opened by dmmiller612 4
  • remove autograd.Variable and fixed symbol P

    remove autograd.Variable and fixed symbol P

    1. The variable API has been deprecated. you can remove this. I just provided an example with basic attention model. if you agree with me, retune it. https://pytorch.org/docs/stable/autograd.html#variable-deprecated

    2. sequence to sequence model has n_step variable(i guess you mean padding... right? but I just little confused that you used identical terminology even if this is tutorial.

    opened by SpellOnYou 2
  • Seq2Seq pytorch

    Seq2Seq pytorch

    Hi thanks for sharing your codes.

    I've had read your seq2seq implementation and I was wondering about the RNN Encode-Decode model.

    in the paper, 'Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation'

    They say

    image

    proposed gating unit image

    and I couldn't find the new hidden-state activation function in your code.

    Do you have any plan to add the proposed activation process? or is it okay to just skip the parts?

    thank you so much in advance

    question 
    opened by DonghyungKo 2
  • Updated the transformer greedy method

    Updated the transformer greedy method

    In this PR, I added a greedy decoder function that generates the decoder input for inference. This is important for translating sentences as we don't know the target input beforehand. In the paper, they mentioned that they ran Beam Search with a k=4. In the greedy approach, k = 1.

    opened by dmmiller612 2
  • Fix comment errors in NNLM

    Fix comment errors in NNLM

    Small mistakes in NNLM:

    X = self.C(X) # X : [batch_size, n_step, n_class]
    X = X.view(-1, n_step * m) # [batch_size, n_step * n_class]
    

    should be

    X = self.C(X) # X : [batch_size, n_step, m]
    X = X.view(-1, n_step * m) # [batch_size, n_step * m]
    

    #48

    opened by secsilm 1
  • Add

    Add "Launch in Deepnote" button

    Hi there, I'm Dan from Deepnote. We're a small startup working on a new, jupyter-compatible data science notebook. I was trying out Deepnote with some cool public repos like this one and I thought it might be useful for others too to have a one-click button to run your repo, so I submitted a PR. Hope you find it useful!

    Thanks for your work, Dan

    opened by danzvara 1
  • doubts about the TextCNN Code

    doubts about the TextCNN Code

    `class TextCNN(nn.Module): def init(self): super(TextCNN, self).init()

        self.num_filters_total = num_filters * len(filter_sizes)
        self.W = nn.Parameter(torch.empty(vocab_size, embedding_size).uniform_(-1, 1)).type(dtype)
        self.Weight = nn.Parameter(torch.empty(self.num_filters_total, num_classes).uniform_(-1, 1)).type(dtype)
        self.Bias = nn.Parameter(0.1 * torch.ones([num_classes])).type(dtype)
    
    def forward(self, X):
        embedded_chars = self.W[X] # [batch_size, sequence_length, sequence_length]
        embedded_chars = embedded_chars.unsqueeze(1) # add channel(=1) [batch, channel(=1), sequence_length, embedding_size]
    
        pooled_outputs = []
        for filter_size in filter_sizes:
            # conv : [input_channel(=1), output_channel(=3), (filter_height, filter_width), bias_option]
            conv = nn.Conv2d(1, num_filters, (filter_size, embedding_size), bias=True)(embedded_chars)
            h = F.relu(conv)
            # mp : ((filter_height, filter_width))
            mp = nn.MaxPool2d((sequence_length - filter_size + 1, 1))
            # pooled : [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3)]
            pooled = mp(h).permute(0, 3, 2, 1)
            pooled_outputs.append(pooled)
    
        h_pool = torch.cat(pooled_outputs, len(filter_sizes)) # [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3) * 3]
        h_pool_flat = torch.reshape(h_pool, [-1, self.num_filters_total]) # [batch_size(=6), output_height * output_width * (output_channel * 3)]
    
        model = torch.mm(h_pool_flat, self.Weight) + self.Bias # [batch_size, num_classes]
        return model`
    

    I wonder if it's wrong to create conv inside the loop?

    opened by ZihaoZheng98 1
  • Why is first parameter `src_vocab_size`?

    Why is first parameter `src_vocab_size`?

    https://github.com/graykode/nlp-tutorial/blob/3b3a80dc63e69935731bcf09c951eb371692af8f/5-1.Transformer/Transformer(Greedy_decoder)-Torch.py#L141

    position encoding table should be (src_len, d_model). Why (src_vocab_size, d_model) here?

    opened by yangdechuan 1
  • A question about Autocomplete LSTM Tensorflow

    A question about Autocomplete LSTM Tensorflow

    In Autocomplete We already have

    X = tf.placeholder(tf.float32, [None, n_step, n_class]) # [batch_size, n_step, n_class]
    Y = tf.placeholder(tf.float32, [None, n_class])         
    

    to guess next missing character

    1. How I can customize them to guess more than 1 character ? I don't have any idea about multiplies a tensor by tensor.
    2. In outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32) Why the shape of states alway (2,), what really 2 mean ? Thank you for sharing the information. aTDpS
    question 
    opened by nvlong198 1
  • different Embedding way

    different Embedding way

    In the code 'Seq2seq-torch.py', i saw u use np.eye,the one-hot representation, to represent embedding, so i change in a normal way ,using nn.Embedding(dict_length,embedding_dim),it can work out. but the loss i got is very high. i wanna ask the differences between this two ways. here are my code and the result.

    image image

    help wanted 
    opened by lhbrichard 1
  • The Learning Rate in 5-2.BERT must be reduced.

    The Learning Rate in 5-2.BERT must be reduced.

    In Line 209:

    optimizer = optim.Adam(model.parameters(), lr=0.001)
    

    In practice, this BERT model is bound to fall into local convergence if the learning rate is 0.001; I think the learning rate should be reduced to 0.0001. The experimental results show that when the learning rate is 0.0001, after about 100 iterations, the loss value will be reduced to 0.1, while if the learning rate is 0.001, the loss value will almost never be less than 2.0.

    when lr=0.01

    Epoch: 0010 cost = 15.205759
    Epoch: 0020 cost = 16.236261
    Epoch: 0030 cost = 18.436878
    Epoch: 0040 cost = 4.077913
    Epoch: 0050 cost = 12.703120
    Epoch: 0060 cost = 10.411244
    Epoch: 0070 cost = 1.640913
    Epoch: 0080 cost = 10.753708
    Epoch: 0090 cost = 8.370532
    Epoch: 0100 cost = 1.624577
    Epoch: 0110 cost = 8.537676
    Epoch: 0120 cost = 7.453298
    Epoch: 0130 cost = 1.659591
    Epoch: 0140 cost = 7.092763
    Epoch: 0150 cost = 6.843360
    Epoch: 0160 cost = 1.688111
    Epoch: 0170 cost = 6.052425
    Epoch: 0180 cost = 6.395712
    Epoch: 0190 cost = 1.707749
    Epoch: 0200 cost = 5.263054
    ······
    Epoch: 5000 cost = 2.523541
    

    when lr=0.0001

    Epoch: 0010 cost = 13.998453
    Epoch: 0020 cost = 6.168099
    Epoch: 0030 cost = 3.504844
    Epoch: 0040 cost = 2.312538
    Epoch: 0050 cost = 1.723783
    Epoch: 0060 cost = 1.412463
    Epoch: 0070 cost = 0.930549
    Epoch: 0080 cost = 0.671946
    Epoch: 0090 cost = 0.745429
    Epoch: 0100 cost = 0.139699
    Epoch: 0110 cost = 0.187208
    Epoch: 0120 cost = 0.075726
    
    opened by Cheng0829 0
  • The Adam in 5-1.Transformer should be replaced by SGD

    The Adam in 5-1.Transformer should be replaced by SGD

    Line 202 : optimizer = optim.Adam(model.parameters(), lr=0.001)

    In practice, I think the effect of Adam is quite bad. When epoch = 10, cost is 1.6; when epoch = 100 or 1000, cost is still equal to 1.6. So I think we can change Adam to SGD, that is, optimizer = optim.SGD(model.parameters(), lr=0.001)

    Here are the effects of using SGD:

    Epoch: 0100 cost = 0.047965
    Epoch: 0200 cost = 0.020129
    Epoch: 0300 cost = 0.012563
    Epoch: 0400 cost = 0.009101
    Epoch: 0500 cost = 0.007131
    Epoch: 0600 cost = 0.005862
    Epoch: 0700 cost = 0.004978
    Epoch: 0800 cost = 0.004325
    Epoch: 0900 cost = 0.003823
    Epoch: 1000 cost = 0.003426
    
    opened by Cheng0829 0
  • Faster attention calculation in 4-2.Seq2Seq?

    Faster attention calculation in 4-2.Seq2Seq?

    Thanks for sharing! Just found out Attention.get_att_weight is calculating attention in a for-loop? this looks rather slow isn't it?

    4-2.Seq2Seq(Attention)/Seq2Seq(Attention).ipynb

        def get_att_weight(self, dec_output, enc_outputs):  # get attention weight one 'dec_output' with 'enc_outputs'
            n_step = len(enc_outputs)
            attn_scores = torch.zeros(n_step)  # attn_scores : [n_step]
    
            for i in range(n_step):
                attn_scores[i] = self.get_att_score(dec_output, enc_outputs[i])
    
            # Normalize scores to weights in range 0 to 1
            return F.softmax(attn_scores).view(1, 1, -1)
    
        def get_att_score(self, dec_output, enc_output):  # enc_outputs [batch_size, num_directions(=1) * n_hidden]
            score = self.attn(enc_output)  # score : [batch_size, n_hidden]
            return torch.dot(dec_output.view(-1), score.view(-1))  # inner product make scalar value
    

    Suggested parallel version

        def get_att_weight(self, dec_output, enc_outputs):  # get attention weight one 'dec_output' with 'enc_outputs'
            n_step = len(enc_outputs)
            attn_scores = torch.zeros(n_step,device=self.device)  # attn_scores : [n_step]
    
            enc_t = self.attn(enc_outputs)
            score = dec_output.transpose(1,0).bmm(enc_t.transpose(1,0).transpose(2,1))
            out1   = score.softmax(-1)
            return out1
    
    
    opened by shouldsee 0
  • BiLstm(tf) maybe have mistake

    BiLstm(tf) maybe have mistake

    calculate attention_score `

    Attention

    outputs = tf.concat([output[0], output[1]], 2) # output[0] : lstm_fw, output[1] : lstm_bw outputs = tf.transpose(outputs, [1, 0, 2]) # [n_step, batch_size, n_hidden]

    只用了最后一个步长的输出

    final_hidden_state = outputs[-1] output_all = tf.concat([output[0], output[1]], 2)
    final_hidden_state = tf.expand_dims(final_hidden_state, 2)
    attn_weights = tf.squeeze(tf.matmul(output_all, final_hidden_state), 2) `

    opened by cui-z 0
  • 5.1 Transformer may have wrong position embed

    5.1 Transformer may have wrong position embed

    1. in"class Encoder": enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[1,2,3,4,0]]))

    I think it may be: enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[0,1,2,3,4]]))

    1. in"class Decoder": dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[5,1,2,3,4]]))

    I think it may be: dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[0,1,2,3,4]]))

    opened by JiangHan97 0
  • 3-3.Bi-LSTM may have wrong padding

    3-3.Bi-LSTM may have wrong padding

    In line 16 you use input = input + [0] * (max_len - len(input)) the padding, you use 0, which means the first word 'Lorem'. but it is not the right choose. I think you can change like that

        # word_dict = {w: i for i, w in enumerate(list(set(sentence.split())))}
        # number_dict = {i: w for i, w in enumerate(list(set(sentence.split())))}
        word_dict = {w: i for i, w in enumerate(['PAD']+list(set(sentence.split())))}
        number_dict = {i: w for i, w in enumerate(['PAD']+list(set(sentence.split())))}
    
    opened by ETWBC 0
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 1.9k Feb 3, 2021
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 1.9k Feb 18, 2021
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 2, 2023
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 19.5k Feb 13, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 19.6k Feb 18, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 10k Feb 18, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

flair 12.3k Jan 2, 2023
Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

Tianxiang Sun 41 Dec 30, 2022
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Won Joon Yoo 335 Jan 4, 2023
Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

Çağrı Karadeniz 7 Mar 12, 2022
One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

Adobe, Inc. 148 Dec 26, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Felipe Maia Polo 125 Dec 20, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 3, 2023
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 40.9k Feb 18, 2021
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022