An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.

Overview

Table of Contents:

  1. Introduction to Torch's Tensor Library
  2. Computation Graphs and Automatic Differentiation
  3. Deep Learning Building Blocks: Affine maps, non-linearities, and objectives
  4. Optimization and Training
  5. Creating Network Components in Pytorch
  • Example: Logistic Regression Bag-of-Words text classifier
  1. Word Embeddings: Encoding Lexical Semantics
  • Example: N-Gram Language Modeling
  • Exercise: Continuous Bag-of-Words for learning word embeddings
  1. Sequence modeling and Long-Short Term Memory Networks
  • Example: An LSTM for Part-of-Speech Tagging
  • Exercise: Augmenting the LSTM tagger with character-level features
  1. Advanced: Dynamic Toolkits, Dynamic Programming, and the BiLSTM-CRF
  • Example: Bi-LSTM Conditional Random Field for named-entity recognition
  • Exercise: A new loss function for discriminative tagging

What is this tutorial?

I am writing this tutorial because, although there are plenty of other tutorials out there, they all seem to have one of three problems:

  • They have a lot of content on computer vision and conv nets, which is irrelevant for most NLP (although conv nets have been applied in cool ways to NLP problems).
  • Pytorch is brand new, and so many deep learning for NLP tutorials are in older frameworks, and usually not in dynamic frameworks like Pytorch, which have a totally different flavor.
  • The examples don't move beyond RNN language models and show the awesome stuff you can do when trying to do lingusitic structure prediction. I think this is a problem, because Pytorch's dynamic graphs make structure prediction one of its biggest strengths.

Specifically, I am writing this tutorial for a Natural Language Processing class at Georgia Tech, to ease into a problem set I wrote for the class on deep transition parsing. The problem set uses some advanced techniques. The intention of this tutorial is to cover the basics, so that students can focus on the more challenging aspects of the problem set. The aim is to start with the basics and move up to linguistic structure prediction, which I feel is almost completely absent in other Pytorch tutorials. The general deep learning basics have short expositions. Topics more NLP-specific received more in-depth discussions, although I have referred to other sources when I felt a full description would be reinventing the wheel and take up too much space.

Dependency Parsing Problem Set

As mentioned above, here is the problem set that goes through implementing a high-performing dependency parser in Pytorch. I wanted to add a link here since it might be useful, provided you ignore the things that were specific to the class. A few notes:

  • There is a lot of code, so the beginning of the problem set was mainly to get people familiar with the way my code represented the relevant data, and the interfaces you need to use. The rest of the problem set is actually implementing components for the parser. Since we hadn't done deep learning in the class before, I tried to provide an enormous amount of comments and hints when writing it.
  • There is a unit test for every deliverable, which you can run with nosetests.
  • Since we use this problem set in the class, please don't publically post solutions.
  • The same repo has some notes that include a section on shift-reduce dependency parsing, if you are looking for a written source to complement the problem set.
  • The link above might not work if it is taken down at the start of a new semester.

References:

  • I learned a lot about deep structure prediction at EMNLP 2016 from this tutorial on Dynet, given by Chris Dyer and Graham Neubig of CMU and Yoav Goldberg of Bar Ilan University. Dynet is a great package, especially if you want to use C++ and avoid dynamic typing. The final BiLSTM CRF exercise and the character-level features exercise are things I learned from this tutorial.
  • A great book on structure prediction is Linguistic Structure Prediction by Noah Smith. It doesn't use deep learning, but that is ok.
  • The best deep learning book I am aware of is Deep Learning, which is by some major contributors to the field and very comprehensive, although there is not an NLP focus. It is free online, but worth having on your shelf.

Exercises:

There are a few exercises in the tutorial, which are either to implement a popular model (CBOW) or augment one of my models. The character-level features exercise especially is very non-trivial, but very useful (I can't quote the exact numbers, but I have run the experiment before and usually the character-level features increase accuracy 2-3%). Since they aren't simple exercises, I will soon implement them myself and add them to the repo.

Suggestions:

Please open a GitHub issue if you find any mistakes or think there is a particular model that would be useful to add.

Comments
  • Bi-LSTM+CRF error

    Bi-LSTM+CRF error

    hello, I try to run BiLSTM_CRF example. I add one example to training_data and get """ training_data = [( "the wall reported apple corporation made money".split(), "B I O B I O O".split() ), ( "georgia tech is a university in georgia".split(), "B I O O O O B".split() ), ("China".split(), 'B'.split()) ] """ And then, when I run """ precheck_sent = prepare_sequence(training_data[2][0], word_to_ix) print model(precheck_sent) """ I get an output 3, which means Start_TAG.

    And then I try to change the code of function _viterbi_decode I change """ terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]] best_tag_id = argmax(terminal_var) path_score = terminal_var[0][best_tag_id] best_path = [best_tag_id] """ to """ terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]] temp = terminal_var.data temp[0][self.tag_to_ix[STOP_TAG]] = -10000 temp[0][self.tag_to_ix[START_TAG]] = -10000 temp = torch.autograd.Variable(temp) best_tag_id = argmax(temp) path_score = temp[0][best_tag_id] best_path = [best_tag_id]` """ then I get a correct output 1, which means 'B'.

    I wonder to know why it produces a completely wrong answer, START_TAG. Thank you!

    opened by ZhixiuYe 6
  • Official pytorch tutorials

    Official pytorch tutorials

    Hi,

    We've included your tutorial in official pytorch tutorials here. it'll be useful for community to have all the awesome tutorials in one place. I also think that html documentation is more approachable than a notebook.

    The current page is almost exactly same as your notebook. I've just raised a PR to break the tutorial into a few parts to make each bite-sized. Would you approve of such a reorganisation? Your feedback would be valuable.

    You can contact me at [email protected]

    Sasank.

    opened by chsasank 2
  • Fails on 2nd last cell

    Fails on 2nd last cell

    model = BiLSTM_CRF(len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM) optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)

    Kindly fix

    Jayanta/Kolkata/India

    TypeError Traceback (most recent call last) in () ----> 1 model = BiLSTM_CRF(len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM) 2 optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)

    in init(self, vocab_size, tag_to_ix, embedding_dim, hidden_dim) 27 28 self.word_embeds = nn.Embedding(vocab_size, embedding_dim) ---> 29 self.lstm = nn.LSTM(embedding_dim, hidden_dim/2, num_layers=1, bidirectional=True) 30 31 # Maps the output of the LSTM into tag space.

    ~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py in init(self, *args, **kwargs) 370 371 def init(self, *args, **kwargs): --> 372 super(LSTM, self).init('LSTM', *args, **kwargs) 373 374

    ~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py in init(self, mode, input_size, hidden_size, num_layers, bias, batch_first, dropout, bidirectional) 37 layer_input_size = input_size if layer == 0 else hidden_size * num_directions 38 ---> 39 w_ih = Parameter(torch.Tensor(gate_size, layer_input_size)) 40 w_hh = Parameter(torch.Tensor(gate_size, hidden_size)) 41 b_ih = Parameter(torch.Tensor(gate_size))

    TypeError: torch.FloatTensor constructor received an invalid combination of arguments - got (float, int), but expected one of:

    • no arguments
    • (int ...) didn't match because some of the arguments have invalid types: (float, int)
    • (torch.FloatTensor viewed_tensor)
    • (torch.Size size)
    • (torch.FloatStorage data)
    • (Sequence data)

    Check predictions before training

    precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)

    precheck_tags = torch.LongTensor([ tag_to_ix[t] for t in training_data[0][1] ])

    print(model(precheck_sent))

    opened by ojncwyog 1
  • Python 2 to 3 Please

    Python 2 to 3 Please

    The very wonderful notebook is in Python 2

    It is all about () in print

    Can you please convert to python3 and for those who prefer py2 from future import unicode_literals, print_function, division

    Regards Jayanta/Kolkata/India

    opened by ojncwyog 1
  • I think you meant...

    I think you meant...

    "Matrices and vectors are special cases of torch.Tensors, where their dimension is 1 and 2 respectively." I think you meant 2 and 1 respectively. Thanks

    opened by joepalermo 1
  • .creator depreciated

    .creator depreciated

    The .creator attribute of the Autograd.variable class appears to have been renamed .grad_fn in the newest release of Pytorch. Thanks for the great tutorial.

    opened by sarahwie 1
  • Batch Data Loading and Processing

    Batch Data Loading and Processing

    Any plan to convert some of the code base (like the LSTM/NER) to allow for minibatch processing? Currently, all of them take one instance at a time. Any pointers in that regard would be useful as well.

    opened by manasRK 1
  • `word_to_ix` on the wrong input?

    `word_to_ix` on the wrong input?

    Hi,

    I'm confused with the way you build word_to_ix here: https://github.com/rguthrie3/DeepLearningForNLPInPytorch/blame/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb#L1448

    Is there any reason of why you don't use the vocab or set(raw_text) instead, like you do here: https://github.com/rguthrie3/DeepLearningForNLPInPytorch/blame/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb#L1298 ??

    opened by gusmonod 1
  • LSTM postagging example

    LSTM postagging example

    I think you should also give the hidden state and cell state otherwise forward method complains: lstm_out, self.hidden = self.lstm(embeds.view(len(sentence), 1, -1), self.hidden)

    opened by gozdesahin 1
  • Pytorch tutorials exercices

    Pytorch tutorials exercices

    Hi Robert,

    I followed your Pytorch tutorials for NLP and I implemented 2 exercices that you proposed. The Cbow and the character level enriching of word embeddings of the POS tagger.

    I would appreciate if you review/comment my implementation and why not discuss about it.

    Here is the repo's link: https://github.com/MokaddemMouna/Pytorch

    Thanks.

    opened by MokaddemMouna 0
  • Lack of sync with pytorch.org tutorial

    Lack of sync with pytorch.org tutorial

    http://pytorch.org/tutorials/beginner/deep_learning_nlp_tutorial.html has 5 nice noteboks all in python3

    The one in github is python2 and not in sync

    This may create difficulty for many as it did for me You may have 1 version - perhaps the 5 working ones

    opened by ojncwyog 0
  • Solution Timeline

    Solution Timeline

    Hi, it is very nice for you to write such a great tutorial. I got this from the pytorch official tutorial and I think it can help a lot of people. But do you have any time line of posting the solution of the exercises? I find these exercises useful but I am not sure whether my answer is right since I am not in the area of NLP. So it will be extremely nice if you can give a simple but instructive solution.

    Thank you!

    opened by pengkaizhu 3
Owner
Robert
Software engineer at Citadel LLC and Georgia Tech grad. Primarily interested in natural language processing, finance, and high-performance computing.
Robert
Open source guides/codes for mastering deep learning to deploying deep learning in production in PyTorch, Python, C++ and more.

Deep Learning Materials by Deep Learning Wizard Start Learning Now Please head to www.deeplearningwizard.com to start learning! It is mobile/tablet fr

Ritchie Ng 572 Dec 28, 2022
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

D2L.ai: Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions Book website | STAT 157 Course at UC Berkeley | Latest version

Dive into Deep Learning (D2L.ai) 16k Jan 3, 2023
Deep Learning (with PyTorch)

Deep Learning (with PyTorch) This notebook repository now has a companion website, where all the course material can be found in video and textual for

Alfredo Canziani 6.2k Jan 2, 2023
A collection of various deep learning architectures, models, and tips

Deep Learning Models A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks. Traditiona

Sebastian Raschka 15.5k Jan 7, 2023
Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

DeepNLP-models-Pytorch Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning) This is not for Pytorch be

Kim SungDong 2.9k Dec 24, 2022
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

PyTorch Examples WARNING: if you fork this repo, github actions will run daily on it. To disable this, go to /examples/settings/actions and Disable Ac

null 19.4k Jan 1, 2023
A scalable template for PyTorch projects, with examples in Image Segmentation, Object classification, GANs and Reinforcement Learning.

PyTorch Project Template is being sponsored by the following tool; please help to support us by taking a look and signing up to a free trial PyTorch P

Mo'men AbdelRazek 740 Dec 23, 2022
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

Mohamed Ayman 33 Dec 2, 2022
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 335 Nov 29, 2022
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

Nicolas Kruchten 512 Dec 26, 2022
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 294 Feb 12, 2021
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

Nicolas Kruchten 419 Feb 11, 2021
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Jan 2, 2023
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.6k Feb 18, 2021
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.4k Feb 17, 2021
This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

Rohan Mathur 9 Jul 17, 2021
Deep Learning for Natural Language Processing - Lectures 2021

This repository contains slides for the course "20-00-0947: Deep Learning for Natural Language Processing" (Technical University of Darmstadt, Summer term 2021).

null 0 Feb 21, 2022