Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

Overview

DeepNLP-models-Pytorch

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning)

  • This is not for Pytorch beginners. If it is your first time to use Pytorch, I recommend these awesome tutorials.

  • If you're interested in DeepNLP, I strongly recommend you to work with this awesome lecture.

This material is not perfect but will help your study and research:) Please feel free to pull requests!!


Contents

Model Links
01. Skip-gram-Naive-Softmax [notebook / data / paper]
02. Skip-gram-Negative-Sampling [notebook / data / paper]
03. GloVe [notebook / data / paper]
04. Window-Classifier-for-NER [notebook / data / paper]
05. Neural-Dependancy-Parser [notebook / data / paper]
06. RNN-Language-Model [notebook / data / paper]
07. Neural-Machine-Translation-with-Attention [notebook / data / paper]
08. CNN-for-Text-Classification [notebook / data / paper]
09. Recursive-NN-for-Sentiment-Classification [notebook / data / paper]
10. Dynamic-Memory-Network-for-Question-Answering [notebook / data / paper]

Requirements

  • Python 3.5
  • Pytorch 0.2+
  • nltk 3.2.2
  • gensim 2.2.0
  • sklearn_crfsuite

Getting started

git clone https://github.com/DSKSD/cs-224n-Pytorch.git

prepare dataset

cd script
chmod u+x prepare_dataset.sh
./prepare_dataset.sh

docker env

ubuntu 16.04 python 3.5.2 with various of ML/DL packages including tensorflow, sklearn, pytorch

docker pull dsksd/deepstudy:0.2

pip3 install docker-compose
cd script
docker-compose up -d

cloud setting

not yet

References

Author

Sungdong Kim / @DSKSD

Comments
  • some data is not available now

    some data is not available now

    ` download dependency parser dataset... (clone from https://github.com/rguthrie3/DeepDependencyParsingProblemSet mkdir: created directory '../dataset/dparser' --2018-03-02 15:08:08-- https://raw.githubusercontent.com/rguthrie3/DeepDependencyParsingProblemSet/master/data/train.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.12.133 Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.12.133|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2018-03-02 15:08:09 ERROR 404: Not Found.

    --2018-03-02 15:08:09-- https://raw.githubusercontent.com/rguthrie3/DeepDependencyParsingProblemSet/master/data/vocab.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.12.133 Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.12.133|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2018-03-02 15:08:09 ERROR 404: Not Found.

    --2018-03-02 15:08:09-- https://raw.githubusercontent.com/rguthrie3/DeepDependencyParsingProblemSet/master/data/dev.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.12.133 Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.12.133|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2018-03-02 15:08:09 ERROR 404: Not Found. `

    It seems like @rguthrie3 has deleted the repo... Could you please update a new address for us? Thanks!

    opened by RasinGue 1
  • Not able to reproduce results for CNN-for-Text-Classification

    Not able to reproduce results for CNN-for-Text-Classification

    Hey,

    I am trying to reproduce this notebook, but the loss do not go down as advertised.

    [0/5] mean_loss : 1.80
    [1/5] mean_loss : 1.64
    [2/5] mean_loss : 1.64
    [3/5] mean_loss : 1.62
    [4/5] mean_loss : 1.76
    

    I am using PyTorch v0.4 on CUDA. My hypothesis is the newer version broke something.

    >torch.__version__
    '0.4.0a0+a3e9151'
    

    Thanks!

    opened by sksq96 1
  • update details to make more useful and readable

    update details to make more useful and readable

    I think this repository is really useful, so I spent many time, and found something.

    This pull request is mainly doing 5 things as follows:

    1. Make the code fit pep8. I noticed your code fit pep8 somewhere, but somewhere not. To make it more readable to more people as a tutorial and keep coherent, I make all the ten notebooks fit pep8.

    2. Add the random seed by random.seed(1024) to make code can be Repeatable. As a tutorial, repeatability is very important. I run the notebook, and everytime get a different result. repeatability can help understand better.

    3. Add gpu device_ids support by gpus=[0];torch.cuda.set_device(gpus[0]). pytorch use all gpus as default if torch.cuda.is_available(). Sometimes, some gpus are used fully. And then, the code raises errors.

    4. Use dict.get(key) to instead if key in dict.keys() when building word vocabulary. The get method time complexity is O(1), the in method is O(n) due to loop. And always the word vocabulary is huge.

    5. Use (3, 4, 5)to replace [3, 4, 5] in notebook 08:CNN-for-Text-classification->CNNClassifier. It's dangerous to use mutable list as the function's default params. It may cause unexpected error. For safty use, I replaced it by tuple, and don't need to change other lines.

    opened by oneTaken 0
  • about the negative example loss in the Skip-gram-Negative-Sampling algorithm

    about the negative example loss in the Skip-gram-Negative-Sampling algorithm

    I have learned a lot from this elegant project. Thanks a lots! Based on the equation in the Skip-gram-Negative-Sampling algorithm below, 微信图片_20210425223243

    I think the negative example loss calculated by

    negative_score = torch.sum(neg_embeds.bmm(center_embeds.transpose(1, 2)).squeeze(2), 1).view(negs.size(0), -1) # BxK -> Bx1 loss = self.logsigmoid(positive_score) + self.logsigmoid(negative_score)

    maybe change to negative_score = neg_embeds.bmm(center_embeds.transpose(1, 2)) loss = self.logsigmoid(positive_score) + torch.sum(self.logsigmoid(negative_score), 1) since based on the equation, the negative_socre first goes through a logsigmoid operation, and then sums up.

    opened by xiaopengguo 0
  • 1. fixing q-type leak 2. tweaks to run with Pytroch 1.x

    1. fixing q-type leak 2. tweaks to run with Pytroch 1.x

    Hi, thank you for the great repository!

    However, I found that for QA type classification, the code includes the sub-type into the training/testing data. Needless to say it's a perfect predictor. So, I am fixing this, plus make a couple of tweaks to make it compatible with the latest Python.

    I couldn't get rid of the annoying GenSim warning, but maybe you'll have more luck if you re-run the notebook.

    Predictably, after removing the leak the accuracy drops. However, I verified that with ALL the data, it goes up again, at least when you randomly sample 10% from the training set to be test data. Maybe, the numbers will be different if you use the official test set. However, you data download script does not download all the data, so I kept this whole thing as is.

    BTW, it takes only a few seconds to train on a MacBook pro (and a couple of minutes on all the data). Not long at all!

    opened by searchivarius 0
  • about padding sequence

    about padding sequence

    Hi, In file 08.CNN-for-Text-Classification.ipynb, where do you pad the input? Is it in [110], line 7: x_p.append(torch.cat([x[i], Variable(LongTensor([word2index['']] * (max_x - x[i].size(1)))).view(1, -1)], 1))? Thanks!

    opened by ShellingFord221 0
  • about pretrained embeddings

    about pretrained embeddings

    Hi, I have a little question about file 08.CNN-for-Text-Classification.ipynb, [96], line 4: pretrained.append(model[word2index[key]]). word2index[key] means to find key's index, then you should find its pretrained embedding in GoogleNews-vectors-negative300.bin. But the index in this bin file should be different from the index generated from TREC dataset, i.e. model[key's index] may not be this key's (word's) embedding. Thanks!

    opened by ShellingFord221 1
  • 08. CNN-for-Text-Classification LogSoftmax와 Cross-entropy

    08. CNN-for-Text-Classification LogSoftmax와 Cross-entropy

    안녕하세요

    좋은 자료 공유해주셔서 정말 감사합니다. 관련 내용을 공부하면서 정말 많은 도움을 받고 있습니다.

    Issue에 글을 쓰게된 이유는 다름이 아니라 08.CNN 예제에서 logsoftmax와 cross-entropy의 중복과 관련된 내용을 문의드리기 위함입니다.

    CNNClassifier의 output은 모델의 출력값에 log_softmax를 취한 결과를 리턴한다고 되어 있는데요.

    후에 모델의 출력 값을 pred라는 변수로 받아서, loss_function(Cross-Entropy)에 input으로 넣어주게 되는데, Pytorch의 Cross-Entropy 함수는 softmax 함수를 통과하기전 raw score의 결과를 input으로 받는다고 알고 있습니다.

    따라서, 예제의 코드는 혹시 softmax가 2번 중첩되어 적용되는 것이 아닌지 궁금하여, 문의를 드리게 되었습니다.

    감사합니다.

    opened by DonghyungKo 2
  • How to save model for  Neural Machine Translation ?

    How to save model for Neural Machine Translation ?

    I want to save model for Neural Machine Translation (https://nbviewer.jupyter.org/github/DSKSD/DeepNLP-models-Pytorch/blob/master/notebooks/07.Neural-Machine-Translation-with-Attention.ipynb). Can you help me ?

    opened by wannaphong 1
Owner
Kim SungDong
Naver AI LAB Researcher Interested in NLP / Representation Learning / Reinforcement Learning
Kim SungDong
Open source guides/codes for mastering deep learning to deploying deep learning in production in PyTorch, Python, C++ and more.

Deep Learning Materials by Deep Learning Wizard Start Learning Now Please head to www.deeplearningwizard.com to start learning! It is mobile/tablet fr

Ritchie Ng 572 Dec 28, 2022
Deep Learning (with PyTorch)

Deep Learning (with PyTorch) This notebook repository now has a companion website, where all the course material can be found in video and textual for

Alfredo Canziani 6.2k Jan 2, 2023
PyTorch Tutorial for Deep Learning Researchers

This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less

Yunjey Choi 25.4k Jan 5, 2023
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

D2L.ai: Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions Book website | STAT 157 Course at UC Berkeley | Latest version

Dive into Deep Learning (D2L.ai) 16k Jan 3, 2023
An IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.

Table of Contents: Introduction to Torch's Tensor Library Computation Graphs and Automatic Differentiation Deep Learning Building Blocks: Affine maps,

Robert 1.8k Jan 4, 2023
PyTorch tutorials.

PyTorch Tutorials All the tutorials are now presented as sphinx style documentation at: https://pytorch.org/tutorials Contributing We use sphinx-galle

null 6.6k Jan 2, 2023
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

PyTorch Examples WARNING: if you fork this repo, github actions will run daily on it. To disable this, go to /examples/settings/actions and Disable Ac

null 19.4k Jan 1, 2023
C++ Implementation of PyTorch Tutorials for Everyone

C++ Implementation of PyTorch Tutorials for Everyone OS (Compiler)\LibTorch 1.9.0 macOS (clang 10.0, 11.0, 12.0) Linux (gcc 8, 9, 10, 11) Windows (msv

Omkar Prabhu 1.5k Jan 4, 2023
Simple examples to introduce PyTorch

This repository introduces the fundamental concepts of PyTorch through self-contained examples. At its core, PyTorch provides two main features: An n-

Justin Johnson 4.4k Jan 7, 2023
Minimal tutorials for PyTorch

Minimal tutorials for PyTorch adapted from Alec Radford's Theano tutorials. Tensor multiplication Linear Regression Logistic Regression Neural Network

Vinh Khuc 321 Oct 25, 2022
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch >= 0.2.0 torchvision >= 0.1.8 fcn >= 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 4, 2023
Simple PyTorch Tutorials Zero to ALL!

PyTorchZeroToAll Quick 3~4 day lecture materials for HKUST students. Video Lectures: (RNN TBA) Youtube Bilibili Slides Lecture Slides @GoogleDrive If

Sung Kim 3.7k Dec 30, 2022
PyTorch tutorials and best practices.

Effective PyTorch Table of Contents Part I: PyTorch Fundamentals PyTorch basics Encapsulate your model with Modules Broadcasting the good and the ugly

Vahid Kazemi 1.5k Jan 4, 2023
A scalable template for PyTorch projects, with examples in Image Segmentation, Object classification, GANs and Reinforcement Learning.

PyTorch Project Template is being sponsored by the following tool; please help to support us by taking a look and signing up to a free trial PyTorch P

Mo'men AbdelRazek 740 Dec 23, 2022
Some example scripts on pytorch

pytorch-practice Some example scripts on pytorch CONLL 2000 Chunking task Uses BiLSTM CRF loss with char CNN embeddings. To run use: cd data/conll2000

Shubhanshu Mishra 180 Dec 22, 2022
Example of network fine-tuning in pytorch for the kaggle competition Dogs vs. Cats Redux: Kernels Edition

Example of network fine-tuning in pytorch for the kaggle competition Dogs vs. Cats Redux: Kernels Edition Currently

bobby 70 Sep 22, 2022
ConvNet training using pytorch

Convolutional networks using PyTorch This is a complete training example for Deep Convolutional Networks on various datasets (ImageNet, Cifar10, Cifar

Elad Hoffer 336 Dec 30, 2022
simple generative adversarial network (GAN) using PyTorch

Generative Adversarial Networks (GANs) in PyTorch Running Run the sample code by typing: ./gan_pytorch.py ...and you'll train two nets to battle it o

vanguard_space 32 Jun 14, 2020
Torch Containers simplified in PyTorch

pytorch-containers This repository aims to help former Torchies more seamlessly transition to the "Containerless" world of PyTorch by providing a list

Max deGroot 88 Apr 25, 2022