Experiments with differentiable stacks and queues in PyTorch

Overview

Please use stacknn-core instead!


StackNN

This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in such a way that it should be easy to integrate them into your own models. For example, to construct a differentiable stack and perform a push:

from StackNN.structs import Stack
stack = Stack(BATCH_SIZE, STACK_VECTOR_SIZE)
read_vectors = stack(value_vectors, pop_strengths, push_strengths)

For examples of more complex use cases of this library, refer to the industrial-stacknns repository.

All the code in this repository is associated with the paper Context-Free Transductions with Neural Stacks, which appeared at the Analyzing and Interpreting Neural Networks for NLP workshop at EMNLP 2018. Refer to our paper for more theoretical background on differentiable data structures.

Running a demo

Check example.ipynb for the most up-to-date demo code.

There are several experiment configurations pre-defined in configs.py. To train a model on one of these configs, do:

python run.py CONFIG_NAME

For example, to train a model on the string reversal task:

python run.py final_reverse_config

In addition to the experiment configuration argument, run.py takes several flags:

  • --model: Model type (BufferedModel or VanillaModel)
  • --controller: Controller type (LinearSimpleStructController, LSTMSimpleStructController, etc.)
  • --struct: Struct type (Stack, NullStruct, etc.)
  • --savepath: Path for saving a trained model
  • --loadpath: Path for loading a model

Documentation

You can find auto-generated documentation here.

Contributing

This project is managed by Computational Linguistics at Yale. We welcome contributions from outside in the form of pull requests. Please report any bugs in the GitHub issues tracker. If you are a Yale student interested in joining our lab, please contact Bob Frank.

Citations

If you use this codebase in your research, please cite the associated paper:

@inproceedings{hao-etal-2018-context,
    title = "Context-Free Transductions with Neural Stacks",
    author = "Hao, Yiding  and
      Merrill, William  and
      Angluin, Dana  and
      Frank, Robert  and
      Amsel, Noah  and
      Benz, Andrew  and
      Mendelsohn, Simon",
    booktitle = "Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}",
    month = nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W18-5433",
    pages = "306--315",
    abstract = "This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are more difficult to train than classical architectures such as LSTMs. Rather than employ stack-based strategies, more complex stack-augmented networks often find approximate solutions by using the stack as unstructured memory.",
}

Dependencies

The core implementation of the data structures is stable in Python 2 and 3. The specific tasks that we have implemented require Python 2.7. We use PyTorch version 0.4.1, with the following additional dependencies:

  • numpy
  • scipy (for data processing)
  • matplotlib (for visualization)
  • nltk

Using pip or conda should suffice for installing most of these dependencies. To get the right command for installing PyTorch, refer to the installation widget on the PyTorch website.

Models

A model is a pairing of a controller network with a neural data structure. There are two kinds of models:

  • models.VanillaModel is a simple controller-data structure network. This means there will be one step of computation per input.
  • models.BufferedModel adds input and output buffers to the vanilla model. This allows the network to run for extra computation steps.

To use a model, call model.forward() on every input and model.init_controller() whenever you want to reset the stack between inputs. You can find example training logic in the tasks package.

Data structures

  • structs.Stack implements the differentiable stack data structure.
  • structs.Queue implements the differentiable queue data structure.

The buffered models use read-only and write-only versions of the differentiable queue for their input and output buffers.

Tasks

The Task class defines specific tasks that models can be trained on. Below are some formal language tasks that we have explored using stack models.

String reversal

The ReverseTask trains a feed-forward controller network to do string reversal. The code generates 800 random binary strings which the network must reverse in a sequence-to-sequence fashion:

Input:   1 1 0 1 # # # #
Label:   # # # # 1 0 1 1

By 10 epochs, the model tends to achieve 100% accuracy. The config for this task is called final_reverse_config.

Context-free language modelling

CFGTask can be used to train a context-free language model. Many interesting questions probing linguistic structure can be reduced to special cases of this general task. For example, the task can be used to model a language of balanced parentheses. The configuration for the parentheses task is final_dyck_config.

Evaluation tasks

We also have a class for evaluation tasks. These are tasks where output i can be succintly expressed as some function of inputs 0, .., i. Some applications of this are evaluation of parity and reverse polish boolean formulae.

Real datasets

The data folder contains several real datasets that the stack can be trained on. We should implement a task for reading in these datasets.

Comments
  • Code Readability Improvements

    Code Readability Improvements

    Over the past couple months, I've written down some notes about ways to clean up the StackNN code.

    I think having clean code is important for several reasons. Firstly, it makes it much easier for external users to interact our code, which helps with reproducibility and follow-up experiments. This is timely given that we are about to publish. Additionally, having organized code and documentation lets new members of CLAY get up to speed quickly in the next semester. Another plus of having a clean code base is that we may be able to submit a pull request to integrate StackNN with pytorch.contrib!

    With all of these concerns in mind, I have several suggestions for ways that we can clean up the existing code base. We should discuss which of these refactorings are the easiest/most important, as well as whether anything is missing.

    Standardizing Class Names with Paper Terminology

    • The Controller hierarchy should be renamed to Model. These classes don’t actually affect the nature of the controller being used, but just the overall model architecture. The SimpleStructNetwork hierarchy should be renamed to Controller. These classes specify the architecture of the controller unit which interfaces with the stack.
    • Drop “Abstract" from AbstractController (so it will end up just as Model).
    • The models.networks subpackage should be made into a top-level package called controllers.

    Documentation Style

    I propose we adapt documentation of functions and classes to follow the Google Python style guide (what TensorFlow uses) or equivalent. This is somewhat subjective, based on what seems more readable/aesthetically pleasing. Converting current documentation comments to that format shouldn’t be much of an effort, and I think it is much more readable. It can be done on a frequency-of-use basis (Whenever encountering a file, we update its documentation). Example Google-style comment:

    def my_function(arg1, arg2):
      """One-line description of function.
    
      Additional in-depth description goes in paragraphs here.
    
      Args:
        arg1: This arg does this.
        arg2: This arg does this.
    
      Returns:
        Always None.
      """
      return None
    

    Performance and Compatibility

    • Removing one-hot encodings. In some cases, replacing one-hot encodings with embedding layers might yield decent increases in performance.
    • Newest PyTorch. This would be nice so we could integrate new features from a rapidly evolving framework.
    • Python 3. Not really a high priority, but something to consider given that Python 2 will be deprecated in 2020.

    Random Stuff

    • Model and controller become required arguments of task constructor.
    • Constructor kwargs to Model (Controller) should not be passed as kwargs to Controller (Network)
    • Should we move tasks/configs.py out of tasks package and instead make it top-level?
    • Cleaning up constructor arguments/default values. The task class hierarchies have default values for arguments at several layers. This can make it confusing where a value is being set. I propose that all values should only be specified at one layer. For most task parameters, this should probably be in the concrete class. If there is a set of parameters applying to all tasks, we can remove them from the concrete constructors and use an **args parameter in concrete to override them.
    • nit: Change Logger to use __enter__/__exit__ hooks instead of __del__. Or use built-in logging module.
    opened by viking-sudo-rm 5
  • Different non-linearity in controller

    Different non-linearity in controller

    Hi all, thank you for publishing your code. I have a question regarding the non-linearity in the output layer of the controller. In their paper, Grefenstette et. al. use sigmoid for the pop and push signals and tanh for the value and output vectors. In your code, you use sigmoids for all outputs. Is that on purpose? Did you experiment with tanh?

    opened by johny-c 2
  • Struct optimization

    Struct optimization

    Changed implementation of structs to use a list of Variables instead of a big Variable.

    Stack and queue test cases pass. On the Linzen task, one epoch takes about 9 minutes with this new implementation.

    In the future, we might want to make it so that entries with zero weight in each batch get deleted from the list. I didn't implement this because I wasn't sure it would actually be efficient (Having to check each item of the vector in each batch at every time step).

    Fixes #29.

    opened by viking-sudo-rm 1
  • Linzen task

    Linzen task

    Implemented Linzen agreement task.

    The Linzen task now gets 96.7% accuracy after 20 epochs. I disabled our custom initialization in favor of the default initialization, and it increased performance. This could explain the odd LSTM behavior that we have observed more generally.

    This is still not replicating the reported results, however. The other main architectural difference as far as I can tell is the softmax versus sigmoid loss function.

    opened by viking-sudo-rm 1
  • Replace max_x_length/max_y_length with max_length

    Replace max_x_length/max_y_length with max_length

    The Task class should just have a max_length parameter.

    Right now, we have max_x_length and max_y_length at the bottom level, both of which are used in weird and seemingly inconsistent ways.

    Most concrete tasks unify these values anyway. Would probably make sense to always make them unified.

    refactor 
    opened by viking-sudo-rm 1
  • Pass task parameters as dictionary or object.

    Pass task parameters as dictionary or object.

    Currently, task parameters are passed as a bunch of named arguments. Example from base.py:

    class Task(object):
        """
        Abstract class for creating experiments that train and evaluate a
        neural network model with a neural stack or queue.
        """
        __metaclass__ = ABCMeta
    
        def __init__(self,
                     batch_size=10,
                     clipping_norm=None,
                     criterion=nn.CrossEntropyLoss(),
                     cuda=False,
                     epochs=100,
                     early_stopping_steps=5,
                     hidden_size=10,
                     learning_rate=0.01,
                     load_path=None,
                     l2_weight=.01,
                     max_x_length=10,
                     max_y_length=10,
                     model_type=VanillaModel,
                     controller_type=LinearSimpleStructController,
                     null=u"#",
                     read_size=2,
                     reg_weight=1.,
                     save_path=None,
                     struct_type=Stack,
                     time_function=(lambda t: t),
                     verbose=True):
    

    This should be replaced by a signature of the format:

    class Task(object):
        """
        Abstract class for creating experiments that train and evaluate a
        neural network model with a neural stack or queue.
        """
        __metaclass__ = ABCMeta
    
        def __init__(self, params):
    

    where params is a TaskParams object with configurable parameters. This makes the inheritance patterns between different types of tasks much more readable. It also makes sure that the default values for all the TaskParams are set in exactly one place, which makes debugging easier.

    Note: TaskParams could extend dictionary, wrap a dictionary, or params could just be of type dictionary. There are several ways to go about this.

    help wanted good first issue refactor 
    opened by viking-sudo-rm 1
  • (P)CFG-Based Language Models

    (P)CFG-Based Language Models

    This is to add the code we worked on with Dana for (P)CFG-based language models. The code is not yet complete.

    Somehow I can't push to your repository, so I pushed to my fork and made this pull request.

    cfg.py contains Dana's version of the code. parentheses.py is my version.

    https://github.com/viking-sudo-rm/StackNN/compare/master...yidinghao:master?expand=1#diff-265097babe6c48f38dcedf6a873bd1fc

    opened by yidinghao 1
  • Implemented ControlLayer for stack interface.

    Implemented ControlLayer for stack interface.

    The control layer takes a basic vector as input and returns a push vector, pop strength, push strength, and read_strength.

    The current form of the layer is very basic. The computation is as follows:

    • push vector: Linear transformation with tanh.
    • push strength: Linear transformation with sigmoid.
    • pop strength: Expected value of a softmax distribution produced from a linear transformation.
    • read strength: Expected value of a softmax distribution produced from a linear transformation.

    We should probably test the layer implemented this way on Reverse, Linzen, etc. I can do some testing when I get back to school in a few days.

    Also, there are some extensions worth trying, such as doing multi-pop/multi-read (instead of taking expected value), or allowing multiple read heads in parallel.

    opened by viking-sudo-rm 0
  • Optimize core data structure logic.

    Optimize core data structure logic.

    Can store stack as a list of Variables instead of a Variable that needs to constantly be reallocated.

    This will make pushing, popping, and reading O(1) instead of O(n), which should drastically improve performance.

    opened by viking-sudo-rm 0
  • Varied configurations for train and test runs on CFG tasks

    Varied configurations for train and test runs on CFG tasks

    This PR accomplishes a few things:

    • Establishes a mechanism for varying parameters between training and test data for formal tasks (open to discussion about the best way to do this, or if there is already another mechanism for doing this)
    • Updates the Dyck task to respect test-specific configurations
    • Update the Dyck task configuration to follow the same parameters for testing as what was used in the paper
    • Adds support for using the LSTMVisualizer on the Dyck task

    It seems to work (and get similar numbers to what was in the paper), but the current implementation slows down training substantially. I still need to dig into why that is (perhaps because max_x_length is now bigger in training?), but should probably wait to merge this until that's solved.

    opened by LK 0
  • Try Linzen task with binary cross entropy.

    Try Linzen task with binary cross entropy.

    The original Linzen paper used binary cross entropy instead of softmax cross entropy.

    https://github.com/TalLinzen/rnn_agreement/blob/master/rnnagr/agreement_acceptor.py

    Results are not replicating with softmax cross entropy. We can try switching to binary cross entropy.

    opened by viking-sudo-rm 0
  • CUDA compatibility

    CUDA compatibility

    I'm not quite sure if the StackNN code is fully CUDA compatible, but recently I had some issues trying to use it.

    Should investigate this and add cuda options everywhere in structs/control_layer.

    opened by viking-sudo-rm 1
Owner
Will Merrill
NLP x linguistics x theory w/ AllenNLP.
Will Merrill
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 655 Jan 4, 2023
PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

Asaf 3 Dec 27, 2022
Differentiable Optimizers with Perturbations in Pytorch

Differentiable Optimizers with Perturbations in PyTorch This contains a PyTorch implementation of Differentiable Optimizers with Perturbations in Tens

Jake Tuero 54 Jun 22, 2022
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 2, 2023
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 4, 2023
Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Parallel Tacotron2 Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Keon Lee 170 Dec 27, 2022
PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch

Keon Lee 76 Dec 20, 2022
Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom Sample on-line plotting while training(avg loss)/testing(writ

Jingwei Zhang 269 Nov 15, 2022
Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

Yuanpu Xie 91 Nov 21, 2022
an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

Simon Niklaus 338 Dec 28, 2022
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

Manli 3 Oct 15, 2022
PyTorch Personal Trainer: My framework for deep learning experiments

Alex's PyTorch Personal Trainer (ptpt) (name subject to change) This repository contains my personal lightweight framework for deep learning projects

Alex McKinney 8 Jul 14, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile <- Makefile with co

Federico Baldassarre 31 Sep 25, 2021
PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

Kate Rakelly 516 Jan 5, 2023
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

Joseph P. Robinson 41 Dec 12, 2022
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

null 0 Jan 23, 2022