Tree LSTM implementation in PyTorch

Overview

Tree-Structured Long Short-Term Memory Networks

This is a PyTorch implementation of Tree-LSTM as described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning. On the semantic similarity task using the SICK dataset, this implementation reaches:

  • Pearson's coefficient: 0.8492 and MSE: 0.2842 using hyperparameters --lr 0.010 --wd 0.0001 --optim adagrad --batchsize 25
  • Pearson's coefficient: 0.8674 and MSE: 0.2536 using hyperparameters --lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed
  • Pearson's coefficient: 0.8676 and MSE: 0.2532 are the numbers reported in the original paper.
  • Known differences include the way the gradients are accumulated (normalized by batchsize or not).

Requirements

  • Python (tested on 3.6.5, should work on >=2.7)
  • Java >= 8 (for Stanford CoreNLP utilities)
  • Other dependencies are in requirements.txt Note: Currently works with PyTorch 0.4.0. Switch to the pytorch-v0.3.1 branch if you want to use PyTorch 0.3.1.

Usage

Before delving into how to run the code, here is a quick overview of the contents:

  • Use the script fetch_and_preprocess.sh to download the SICK dataset, Stanford Parser and Stanford POS Tagger, and Glove word vectors (Common Crawl 840) -- Warning: this is a 2GB download!), and additionally preprocees the data, i.e. generate dependency parses using Stanford Neural Network Dependency Parser.
  • main.pydoes the actual heavy lifting of training the model and testing it on the SICK dataset. For a list of all command-line arguments, have a look at config.py.
    • The first run caches GLOVE embeddings for words in the SICK vocabulary. In later runs, only the cache is read in during later runs.
    • Logs and model checkpoints are saved to the checkpoints/ directory with the name specified by the command line argument --expname.

Next, these are the different ways to run the code here to train a TreeLSTM model.

Local Python Environment

If you have a working Python3 environment, simply run the following sequence of steps:

- bash fetch_and_preprocess.sh
- pip install -r requirements.txt
- python main.py

Pure Docker Environment

If you want to use a Docker container, simply follow these steps:

- docker build -t treelstm .
- docker run -it treelstm bash
- bash fetch_and_preprocess.sh
- python main.py

Local Filesystem + Docker Environment

If you want to use a Docker container, but want to persist data and checkpoints in your local filesystem, simply follow these steps:

- bash fetch_and_preprocess.sh
- docker build -t treelstm .
- docker run -it --mount type=bind,source="$(pwd)",target="/root/treelstm.pytorch" treelstm bash
- python main.py

NOTE: Setting the environment variable OMP_NUM_THREADS=1 usually gives a speedup on the CPU. Use it like OMP_NUM_THREADS=1 python main.py. To run on a GPU, set the CUDA_VISIBLE_DEVICES instead. Usually, CUDA does not give much speedup here, since we are operating at a batchsize of 1.

Notes

  • (Apr 02, 2018) Added Dockerfile
  • (Apr 02, 2018) Now works on PyTorch 0.3.1 and Python 3.6, removed dependency on Python 2.7
  • (Nov 28, 2017) Added frozen embeddings, closed gap to paper.
  • (Nov 08, 2017) Refactored model to get 1.5x - 2x speedup.
  • (Oct 23, 2017) Now works with PyTorch 0.2.0.
  • (May 04, 2017) Added support for sparse tensors. Using the --sparse argument will enable sparse gradient updates for nn.Embedding, potentially reducing memory usage.
    • There are a couple of caveats, however, viz. weight decay will not work in conjunction with sparsity, and results from the original paper might not be reproduced using sparse embeddings.

Acknowledgements

Shout-out to Kai Sheng Tai for the original LuaTorch implementation, and to the Pytorch team for the fun library.

Contact

Riddhiman Dasgupta

This is my first PyTorch based implementation, and might contain bugs. Please let me know if you find any!

License

MIT

Comments
  • No such file or directory: data/sick/train/a.toks

    No such file or directory: data/sick/train/a.toks

    Hi: I need help! IOError: [Errno 2] No such file or directory: 'data/sick/train/a.toks' run python main.py with default paras, then throw this error.

    Where this file a.toks comes from?

    opened by adrianhust 7
  • map_label_to_target should init zero tensor

    map_label_to_target should init zero tensor

    Your map_label_to_target for SICK dataset init random tensor.

    def map_label_to_target(label,num_classes):
        target = torch.Tensor(1,num_classes) # this is not zero tensor
        ceil = int(math.ceil(label))
        floor = int(math.floor(label))
        if ceil==floor:
            target[0][floor-1] = 1
        else:
            target[0][floor-1] = ceil - label
            target[0][ceil-1] = label - floor
        return target
    

    However, in treelstm , the author init zero tensor

    local targets = torch.zeros(batch_size, self.num_classes)
    for j = 1, batch_size do
      local sim = dataset.labels[indices[i + j - 1]] * (self.num_classes - 1) + 1
      local ceil, floor = math.ceil(sim), math.floor(sim)
      if ceil == floor then
        targets[{j, floor}] = 1
      else
        targets[{j, floor}] = ceil - sim
        targets[{j, ceil}] = sim - floor
      end
    
    opened by ttpro1995 5
  • Matrix problem

    Matrix problem

      File ".../treelstm.pytorch/model.py", line 36, in node_forward
        u = F.tanh(self.ux(inputs)+self.uh(child_h_sum))
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 54, in forward
        return self._backend.Linear.apply(input, self.weight, self.bias)
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 12, in forward
        output.addmm_(0, 1, input, weight.t())
    RuntimeError: matrix and matrix expected at 
    

    I think you may miss the unsqueeze operation?

            i = F.sigmoid(self.ix(inputs)+self.ih(child_h_sum.unsqueeze(0)))
            o = F.sigmoid(self.ox(inputs)+self.oh(child_h_sum.unsqueeze(0)))
            u = F.tanh(self.ux(inputs)+self.uh(child_h_sum.unsqueeze(0)))
    
    opened by gujiuxiang 3
  • can not find packages

    can not find packages

    lib\CollapseUnaryTransformer.java:3: 错误: 程序包edu.stanford.nlp.ling不存在 import edu.stanford.nlp.ling.Label; ^ lib\CollapseUnaryTransformer.java:4: 错误: 程序包edu.stanford.nlp.trees不存在 import edu.stanford.nlp.trees.Tree; ^ lib\CollapseUnaryTransformer.java:5: 错误: 程序包edu.stanford.nlp.trees不存在 import edu.stanford.nlp.trees.TreeTransformer; ^ lib\CollapseUnaryTransformer.java:6: 错误: 程序包edu.stanford.nlp.util不存在 import edu.stanford.nlp.util.Generics;

    ... What can I do ??

    opened by venusafroid 2
  • how to run it in GPU???

    how to run it in GPU???

    i run pip install -r req....

    but ,couldnt run it with python main.py --cuda

    with the trace back: AssertionError: Torch not compiled with CUDA enabled

    opened by herbertchen1 2
  • Can the

    Can the

    I run the sentiment model successfully. My gpus are double 1080ti, and get a 14% in gpu0. Is there an extra way to run it on multigpu? I implement a model in tensorflow fold, but it seems that it can't support multigpu.

    opened by KazuhiraDZ 2
  • Sizes do not match

    Sizes do not match

    When I run python main.py I met the following error message

    Namespace(batchsize=25, cuda=True, data='data/sick/', epochs=15, expname='test', glove='data/glove/', hidden_dim=50, input_dim=150, lr=0.01, mem_dim=75, num_classes=5, >optim='adagrad', save='checkpoints/', seed=123, sparse=False, wd=0.0001) ==> SICK vocabulary size : 2412 ==> Size of train data : 4500 ==> Size of dev data : 500 ==> Size of test data : 4927 Traceback (most recent call last): File "main.py", line 157, in main() File "main.py", line 126, in main model.childsumtreelstm.emb.state_dict()['weight'].copy_(emb)

    RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/THCTensorCopy.cu:31

    The platform is Arch Linux and CUDA8.0

    I would appreciate it for any reply.

    opened by Jarvx 2
  • How to make it with dynamic batching?

    How to make it with dynamic batching?

    This implementation can only process one sample at a time. The performance is limited since the usage of the GPU is low. Is there possibility to make treelstm support dynamic batching such that the GPU can be fully utilized?

    opened by xuehy 2
  • ChildSumTreeLSTM : fx and fh linear layer are declare but is not used

    ChildSumTreeLSTM : fx and fh linear layer are declare but is not used

    Line 21, 22

    self.fx = nn.Linear(self.in_dim,self.mem_dim)
    self.fh = nn.Linear(self.mem_dim,self.mem_dim)
    

    But it is never use

    I think you intend to use in line 38, 39. (perhaps typo ix with fx )

    fx = F.torch.unsqueeze(self.ix(inputs),1)
    f = F.torch.cat([self.ih(child_hi)+fx for child_hi in child_h], 0)
    
    
    opened by ttpro1995 2
  • classpath error

    classpath error

    my current environment: windows 10 python 3.6 pytorch 0.4 IDE pycharm I try to run the code preprocess-sick.py and get an error cannot find or load the class then I try to copy the java cmd to windows cmd window there is an same error raised error code line:

        cmd = ('java -cp %s DependencyParse -tokpath %s -parentpath %s -relpath %s %s < %s'
               % (cp, tokpath, parentpath, relpath, tokenize_flag, filepath))
        os.system(cmd)
    
    
    opened by dgai91 1
  • Fix RuntimeError: a leaf Variable that requires grad has been used in…

    Fix RuntimeError: a leaf Variable that requires grad has been used in…

    I just tried to run the training locally, and met with the following error when not to freeze the embedding.

    Traceback (most recent call last): File "main.py", line 189, in main() File "main.py", line 138, in main emb) File "/Users/Jizg/git/treelstm.pytorch/treelstm/model.py", line 76, in init self.emb.weight.copy_(init_emb) RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

    I updated the main script a little bit and it seems to be OK now.

    opened by jizg 1
  • IndexError: index 54 is out of bounds for dimension 0 with size 54

    IndexError: index 54 is out of bounds for dimension 0 with size 54

    tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
    

    len(inputs) == 54 tree.idx == 54

    https://github.com/dasguptar/treelstm.pytorch/blob/228a314add09fc7f39ea752aa7b1fcf756cfe277/treelstm/dataset.py#L70

    more informations
    
    inputs[tree.idx] tensor([[ 3.7410e-02,  5.7619e-02,  3.3822e-01,  ..., -3.5774e-02,
             -7.8579e-02,  1.0644e-02],
            [-2.5287e-02, -2.5835e-01, -7.5715e-02,  ...,  1.2864e-01,
              1.3856e-01,  3.3581e-01],
            [-5.4430e-02, -1.6442e-01, -6.7605e-02,  ...,  1.7388e-01,
             -3.9886e-01, -1.3006e-02],
            ...,
            [-2.5433e-02, -8.0709e-02,  6.2163e-01,  ...,  2.7345e-01,
             -5.6782e-02,  1.8956e-01],
            [-2.4587e-01,  8.9087e-03, -1.5240e-03,  ..., -3.2474e-01,
              1.1630e-02, -1.3252e-01],
            [ 4.9405e-04, -3.5795e-01, -2.2226e-01,  ..., -9.1428e-02,
              2.2649e-01, -2.0806e-01]], device='cuda:0',
           grad_fn=<EmbeddingBackward>)
    Traceback (most recent call last):
      File "main.py", line 185, in <module>
        main()
      File "main.py", line 155, in main
        train_loss = trainer.train(train_dataset)
      File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/trainer.py", line 29, in train
        output = self.model(linput, rtree, rinput)
      File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 90, in forward
        rstate, rhidden = self.childsumtreelstm(rtree, rinputs)
      File "/home/qingdujun/Applications/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
        self.forward(tree.children[idx], inputs)
      File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
        self.forward(tree.children[idx], inputs)
      File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 38, in forward
        self.forward(tree.children[idx], inputs)
      [Previous line repeated 10 more times]
      File "/home/qingdujun/public/runtime/models/treelstm.pytorch/treelstm/model.py", line 48, in forward
        tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
    IndexError: index 54 is out of bounds for dimension 0 with size 54
    
    opened by qingdujun 0
  • Error while Compiling

    Error while Compiling

    Ubuntu 18.04 Java 11.0.3 running (as part of fetch_and_preprocess.sh) javac -cp $CLASSPATH lib/*.java -Xlint:unchecked

    lib/CollapseUnaryTransformer.java:17: error: error while writing CollapseUnaryTransform
    er: /home/eduard_ergenzinger/treelstm.pytorch/lib/CollapseUnaryTransformer.class
    public class CollapseUnaryTransformer implements TreeTransformer {
           ^
    lib/ConstituencyParse.java:58: warning: [unchecked] unchecked call to PTBTokenizer(Read
    er,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
          PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
    okenFactory(), "");
                                         ^
      where T is a type-variable:
        T extends HasWord declared in class PTBTokenizer
    lib/ConstituencyParse.java:58: warning: [unchecked] unchecked conversion
          PTBTokenizer<Word> tokenizer = new PTBTokenizer(new StringReader(line), new WordT
    okenFactory(), "");
                                         ^
      required: PTBTokenizer<Word>
      found:    PTBTokenizer
    lib/DependencyParse.java:57: warning: [unchecked] unchecked call to PTBTokenizer(Reader
    ,LexedTokenFactory<T>,String) as a member of the raw type PTBTokenizer
            PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                           ^
      where T is a type-variable:
        T extends HasWord declared in class PTBTokenizer
    lib/DependencyParse.java:57: warning: [unchecked] unchecked conversion
            PTBTokenizer<Word> tokenizer = new PTBTokenizer(
                                           ^
      required: PTBTokenizer<Word>
      found:    PTBTokenizer
    1 error
    4 warnings
    
    opened by 121eddie 3
  • Nodes' hidden representations?

    Nodes' hidden representations?

    Hello, not an issue, but what's the easiest way to extract the learned hidden embeddings for each node in a ChildSum tree? New to PyTorch, so forgive my ignorance.

    Thanks!

    opened by chriswtanner 0
  • Batch support for TreeLSTM

    Batch support for TreeLSTM

    Existing implementation doesn't support forward/backward with batch of trees as inputs, which is slow in training and inference. The pull requests support batch operation for TreeLSTM, and reproduces the exact same results as without batch.

    To run with batch:

    - python main.py --use_batch
    
    opened by jinfengr 0
  • Does current TreeLSTM support batch size?

    Does current TreeLSTM support batch size?

    It seems batch size is still not supported from the code? In the forward function of ChildSumTreeLSTM, it seems that it only support process a single tree in one forward.

    `

     def forward(self, tree, inputs):
        for idx in range(tree.num_children):
            self.forward(tree.children[idx], inputs)
    
        if tree.num_children == 0:
            child_c = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
            child_h = inputs[0].detach().new(1, self.mem_dim).fill_(0.).requires_grad_()
        else:
            child_c, child_h = zip(* map(lambda x: x.state, tree.children))
            child_c, child_h = torch.cat(child_c, dim=0), torch.cat(child_h, dim=0)
    
        tree.state = self.node_forward(inputs[tree.idx], child_c, child_h)
        return tree.state
    

    `

    opened by jinfengr 0
Owner
Riddhiman Dasgupta
Deep Learning, Science Fiction, Comic Books
Riddhiman Dasgupta
Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021)

PGpoints Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021) Hyeontae Son, Young Min Kim Pre

Hyeontae Son 9 Jun 6, 2022
Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Zijie Zhuang 734 Jan 3, 2023
A3C LSTM Atari with Pytorch plus A3G design

NEWLY ADDED A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!! RL A3C Pytorch NEWLY ADDED A3G!! New implementation of A3C

David Griffis 532 Jan 2, 2023
LSTM and QRNN Language Model Toolkit for PyTorch

LSTM and QRNN Language Model Toolkit This repository contains the code used for two Salesforce Research papers: Regularizing and Optimizing LSTM Langu

Salesforce 1.9k Jan 8, 2023
Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Zijie Zhuang 733 Dec 30, 2022
LSTM model trained on a small dataset of 3000 names written in PyTorch

LSTM model trained on a small dataset of 3000 names. Model generates names from model by selecting one out of top 3 letters suggested by model at a time until an EOS (End Of Sentence) character is not encountered.

Sahil Lamba 1 Dec 20, 2021
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

null 56 Dec 15, 2022
OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Stock Price Prediction of Apple Inc. Using Recurrent Neural Network OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network Dataset:

Nouroz Rahman 410 Jan 5, 2023
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

null 195 Dec 7, 2022
Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Multidimensional LSTM BitCoin Time Series Using multidimensional LSTM neural networks to create a forecast for Bitcoin price. For notes around this co

Jakob Aungiers 318 Dec 14, 2022
Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Deep learning based state estimation: incorporating Transformer and LSTM to Kalman Filter with EM algorithm Overview Kalman Filter requires the true p

zshicode 57 Dec 27, 2022
Forecasting directional movements of stock prices for intraday trading using LSTM and random forest

Forecasting directional movements of stock-prices for intraday trading using LSTM and random-forest https://arxiv.org/abs/2004.10178 Pushpendu Ghosh,

Pushpendu Ghosh 270 Dec 24, 2022
Deep learning based hand gesture recognition using LSTM and MediaPipie.

Hand Gesture Recognition Deep learning based hand gesture recognition using LSTM and MediaPipie. Demo video using PingPong Robot Files Pretrained mode

Brad 24 Nov 11, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Arno Barton 1 Oct 29, 2021
LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Package Description The difficulties in acquiring spectroscopic data have been a major challenge for supernova surveys. snlstm is developed to provide

null 7 Oct 11, 2022
A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

LSTM-Time-Series-Prediction A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi Contest. The Link of the Cont

KevinCHEN 1 Jun 13, 2022
Using LSTM to detect spoofing attacks in an Air-Ground network

Using LSTM to detect spoofing attacks in an Air-Ground network Specifications IDE: Spider Packages: Tensorflow 2.1.0 Keras NumPy Scikit-learn Matplotl

Tiep M. H. 1 Nov 20, 2021