Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

Overview

ON-LSTM

This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks paper, originally forked from the LSTM and QRNN Language Model Toolkit for PyTorch. If you use this code or our results in your research, we'd appreciate if you cite our paper as following:

@article{shen2018ordered,
  title={Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks},
  author={Shen, Yikang and Tan, Shawn and Sordoni, Alessandro and Courville, Aaron},
  journal={arXiv preprint arXiv:1810.09536},
  year={2018}
}

Software Requirements

Python 3.6, NLTK and PyTorch 0.4 are required for the current codebase.

Steps

  1. Install PyTorch 0.4 and NLTK

  2. Download PTB data. Note that the two tasks, i.e., language modeling and unsupervised parsing share the same model strucutre but require different formats of the PTB data. For language modeling we need the standard 10,000 word Penn Treebank corpus data, and for parsing we need Penn Treebank Parsed data.

  3. Scripts and commands

    • Train Language Modeling python main.py --batch_size 20 --dropout 0.45 --dropouth 0.3 --dropouti 0.5 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000 --data /path/to/your/data

    • Test Unsupervised Parsing python test_phrase_grammar.py --cuda

    The default setting in main.py achieves a perplexity of approximately 56.17 on PTB test set and unlabeled F1 of approximately 47.7 on WSJ test set.

Comments
  • Data Directory used when running test_phrase_grammar.py

    Data Directory used when running test_phrase_grammar.py

    Hi Yikang and other Contributors,

    Thank you for making public the source code! I am trying to reproduce your results, but I am not sure what path to use as the command line argument of test_phrase_grammar --data. I downloaded PTB data and I am currently using treebank_3/parsed/mrg as the data argument. It does not work.

    The listings under treebank_3/parsed/mrg: atis brown readme.mrg swbd wsj

    The listings under treebank_3/parsed/mrg/wsj:

    00 06 12 18 24 01 07 13 19 MERGE.LOG 02 08 14 20 03 09 15 21 04 10 16 22 05 11 17 23

    Thank you for your time! Ian

    opened by YianZhang 9
  • question about unidirectional

    question about unidirectional

    In your paper, you use a unidirectional ON-LSTM to trained a language model and then phrase grammar with the output distance of the pretrained language model. How can we explain that the level of first token is independent with the future tokens? Is there any bidirectional way to do it?

    opened by bojone 3
  • Question about dataset construction

    Question about dataset construction

    Hello Yikang:

    Hi~, I'm a research intern of HIT-SCIR lab, Yangming Li. It's great for your contribution about this repository. But I found some problems about the dataset construction (including test set):

    1, the use of pytorch API "narrow" will unexpectedly abandon some words and result in incorrect PPL score.

    2, It seems that your slide window on the whole corpus is not continuous and thus generate far less data than usual.

    Great thanks again for your contribution about this repository. Yangming, 19/08/02

    opened by LeePleased 2
  • How to train parsing

    How to train parsing

    Hi I wonder how to train parsing. The main.py seems to be only for training LM.

    Besides, when I try testing the parsing, python test_phrase_grammar.py --cuda gives error of No such file or directory: 'PTB.pt'.

    Best regards, Ron

    opened by ronsoohyeong 2
  • question about visualization

    question about visualization

    Hello, I am interested in ON-LSTM and I want to reimplement it under Keras.

    I trained a Chinese language model with ON-LSTM and then export the distance (as you do in https://github.com/yikangshen/Ordered-Neurons/blob/master/ON_LSTM.py#L89 ). However, I found all elements of distance is quite close to 1 (0.995, 0.99, 0.98, ...).

    Is it a normal phenomenon in your experience ?

    opened by bojone 2
  • High performance for right-branching strategy

    High performance for right-branching strategy

    Really appreciate for releasing the code.

    I found when I testing the baseline of right-branching strategy on WSJ test set, the F1 is really high (39.87), which does not match the result in the paper (16.5).

    I have just changed the code

    distance = model.distance[0].squeeze().data.cpu().numpy()
    distance_in = model.distance[1].squeeze().data.cpu().numpy()
    

    into

    distance = numpy.array([numpy.arange(len(sen), 0, -1)] * 3)
    distance_in = numpy.array([numpy.arange(len(sen), 0, -1)] * 3)
    

    , which represent a right-branching strategy.

    And the result on WSJ test set is: image

    So, what my be the reason? Thanks a lot if u could help me out.

    opened by marcwww 2
  • Confusion on eq. 15

    Confusion on eq. 15

    Dear Yikang,

    I am new to NLP but I really like this paper and appreciate your work. Now I am reading the paper and have a question about Eq. 15 in section 5.2. What does y_t mean? Thank you!

    opened by amy-deng 2
  • hyperlink of

    hyperlink of "Penn Treebank Parsed" is broken

    The hyperlink of "Penn Treebank Parsed" in ReadMe.md is broken and I can not find the correct download link. Is this the same? I found this link on the page NLTK data.

    opened by hitvoice 2
  • cur_loss suddenly increases to a larger number

    cur_loss suddenly increases to a larger number

    Hi, Yikang thanks a lot for this awesome paper!

    When I try to run the below command,

    python main.py --batch_size 20 --dropout 0.45 --dropouth 0.3 --dropouti 0.5 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000

    Such error triggered at certain(5th) epoch:

    File "main.py", line 269, in train() File "main.py", line 245, in train elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss), cur_loss / math.log(2))) OverflowError: math range error

    My initial found is that the cur_loss suddenly increases to a larger number(from ~5 to more than 10000), which results in such error. However, I am not sure what cause this sudden huge increment.

    opened by xuuuluuu 1
  • two evaluation result in test_phrase_grammar.py, which is reported in the paper?

    two evaluation result in test_phrase_grammar.py, which is reported in the paper?

    It seems that there are two evaluation result in test_phrase_grammar.py(one comes from evalb software, the other is computed by yourself), which is reported in the paper? what is the difference between them?

    opened by ZhiYuanZeng 1
  • Tuning contextual embeddings with hierarchical relations

    Tuning contextual embeddings with hierarchical relations

    I have a masked LM pretrained with bert.

    The embeddings are poor on the sentence level, but do well for base tokens. There is a natural tree structure to my corpus that I believe stands to gain from something like on-lstm.

    Do you think swapping out the embedding layer of the on-lstm with pretrained bert embeddings could be fruitful?

    opened by goaaron 1
  • Question about the model design details

    Question about the model design details

    Hi, thanks for sharing the source code.

    According to the Equation (10) in your paper, I guess the last elements of $\tilde{i}_t$ will always be zero, e.g., [0.8, 0.3, 0.1, 0]. Is this on purpose? If yes, could you please explain why? I just think this will let the upmost neuron chunk keep copying history without writing in anything new, is this correct?

    opened by speedcell4 1
  • Default Parameters

    Default Parameters

    Hi Yikang,

    If I want to reproduce your work, what parameters should I use?

    In the readme, you suggest use the default parameters in main.py. At the same time, you provide another set of parameters: "python main.py --batch_size 20 --dropout 0.45 --dropouth 0.3 --dropouti 0.5 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000 --data /path/to/your/data".

    Which one should I use? I tried both ones and after 48 hours, the quoted parameters outperform the default one, so I would like to double-check with you.

    Thanks, Ian

    opened by YianZhang 0
  • CAN RUN main.py without GPU.

    CAN RUN main.py without GPU.

    I install Pytorch0.4 which choose cuda none.And after i run the main.py ,i got a error:torch.cuda.LongTensor is not enabled.,which i haven't found a simillar problem online.whether i need use a computer with GPU and CUDA?

    opened by huamgmin 2
  • ZeroDivisionError    test_phrase_grammar

    ZeroDivisionError test_phrase_grammar

    Hi,when I run the test_phrase_grammar.py ,I will get this return like following:

    ZeroDivisionError: float division by zero

    This is the specific error: image

    opened by L0ittle 5
  • checkpoint download

    checkpoint download

    I'm sorry to bother you that when I try to test this model, I have no where to download the checkpoint. With the link you have proved ,I only found the ‘.txt’ files, where can I download the 'PTB.pt'

    FileNotFoundError: [Errno 2] No such file or directory: 'PTB.pt'

    opened by L0ittle 1
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 9, 2021
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 8, 2023
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python>=3.7 pytorch>=1.6.0 torchvision>=0.8

Yunfan Li 210 Dec 30, 2022
Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University 139 Nov 18, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

null 43 Nov 19, 2022
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 8, 2023
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

Ofir Press 138 Apr 15, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022