Experiments with differentiable stacks and queues in PyTorch

Will Merrill

Last update: Oct 6, 2022

Related tags

Deep Learning StackNN

Overview

Please use stacknn-core instead!

StackNN

This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in such a way that it should be easy to integrate them into your own models. For example, to construct a differentiable stack and perform a push:

from StackNN.structs import Stack
stack = Stack(BATCH_SIZE, STACK_VECTOR_SIZE)
read_vectors = stack(value_vectors, pop_strengths, push_strengths)

For examples of more complex use cases of this library, refer to the industrial-stacknns repository.

All the code in this repository is associated with the paper Context-Free Transductions with Neural Stacks, which appeared at the Analyzing and Interpreting Neural Networks for NLP workshop at EMNLP 2018. Refer to our paper for more theoretical background on differentiable data structures.

Running a demo

Check example.ipynb for the most up-to-date demo code.

There are several experiment configurations pre-defined in configs.py. To train a model on one of these configs, do:

python run.py CONFIG_NAME

For example, to train a model on the string reversal task:

python run.py final_reverse_config

In addition to the experiment configuration argument, run.py takes several flags:

--model: Model type (BufferedModel or VanillaModel)
--controller: Controller type (LinearSimpleStructController, LSTMSimpleStructController, etc.)
--struct: Struct type (Stack, NullStruct, etc.)
--savepath: Path for saving a trained model
--loadpath: Path for loading a model

Documentation

You can find auto-generated documentation here.

Contributing

This project is managed by Computational Linguistics at Yale. We welcome contributions from outside in the form of pull requests. Please report any bugs in the GitHub issues tracker. If you are a Yale student interested in joining our lab, please contact Bob Frank.

Citations

If you use this codebase in your research, please cite the associated paper:

@inproceedings{hao-etal-2018-context,
    title = "Context-Free Transductions with Neural Stacks",
    author = "Hao, Yiding  and
      Merrill, William  and
      Angluin, Dana  and
      Frank, Robert  and
      Amsel, Noah  and
      Benz, Andrew  and
      Mendelsohn, Simon",
    booktitle = "Proceedings of the 2018 {EMNLP} Workshop {B}lackbox{NLP}: Analyzing and Interpreting Neural Networks for {NLP}",
    month = nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W18-5433",
    pages = "306--315",
    abstract = "This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are more difficult to train than classical architectures such as LSTMs. Rather than employ stack-based strategies, more complex stack-augmented networks often find approximate solutions by using the stack as unstructured memory.",
}

Dependencies

The core implementation of the data structures is stable in Python 2 and 3. The specific tasks that we have implemented require Python 2.7. We use PyTorch version 0.4.1, with the following additional dependencies:

numpy
scipy (for data processing)
matplotlib (for visualization)
nltk

Using pip or conda should suffice for installing most of these dependencies. To get the right command for installing PyTorch, refer to the installation widget on the PyTorch website.

Models

A model is a pairing of a controller network with a neural data structure. There are two kinds of models:

models.VanillaModel is a simple controller-data structure network. This means there will be one step of computation per input.
models.BufferedModel adds input and output buffers to the vanilla model. This allows the network to run for extra computation steps.

To use a model, call model.forward() on every input and model.init_controller() whenever you want to reset the stack between inputs. You can find example training logic in the tasks package.

Data structures

structs.Stack implements the differentiable stack data structure.
structs.Queue implements the differentiable queue data structure.

The buffered models use read-only and write-only versions of the differentiable queue for their input and output buffers.

Tasks

The Task class defines specific tasks that models can be trained on. Below are some formal language tasks that we have explored using stack models.

String reversal

The ReverseTask trains a feed-forward controller network to do string reversal. The code generates 800 random binary strings which the network must reverse in a sequence-to-sequence fashion:

Input:   1 1 0 1 # # # #
Label:   # # # # 1 0 1 1

By 10 epochs, the model tends to achieve 100% accuracy. The config for this task is called final_reverse_config.

Context-free language modelling

CFGTask can be used to train a context-free language model. Many interesting questions probing linguistic structure can be reduced to special cases of this general task. For example, the task can be used to model a language of balanced parentheses. The configuration for the parentheses task is final_dyck_config.

Evaluation tasks

We also have a class for evaluation tasks. These are tasks where output i can be succintly expressed as some function of inputs 0, .., i. Some applications of this are evaluation of parity and reverse polish boolean formulae.

Real datasets

The data folder contains several real datasets that the stack can be trained on. We should implement a task for reading in these datasets.

Comments

Code Readability Improvements
Over the past couple months, I've written down some notes about ways to clean up the StackNN code.

I think having clean code is important for several reasons. Firstly, it makes it much easier for external users to interact our code, which helps with reproducibility and follow-up experiments. This is timely given that we are about to publish. Additionally, having organized code and documentation lets new members of CLAY get up to speed quickly in the next semester. Another plus of having a clean code base is that we may be able to submit a pull request to integrate StackNN with pytorch.contrib!

With all of these concerns in mind, I have several suggestions for ways that we can clean up the existing code base. We should discuss which of these refactorings are the easiest/most important, as well as whether anything is missing.

Standardizing Class Names with Paper Terminology

The Controller hierarchy should be renamed to Model. These classes don’t actually affect the nature of the controller being used, but just the overall model architecture. The SimpleStructNetwork hierarchy should be renamed to Controller. These classes specify the architecture of the controller unit which interfaces with the stack.

Drop “Abstract" from AbstractController (so it will end up just as Model).

The models.networks subpackage should be made into a top-level package called controllers.

Documentation Style

I propose we adapt documentation of functions and classes to follow the Google Python style guide (what TensorFlow uses) or equivalent. This is somewhat subjective, based on what seems more readable/aesthetically pleasing. Converting current documentation comments to that format shouldn’t be much of an effort, and I think it is much more readable. It can be done on a frequency-of-use basis (Whenever encountering a file, we update its documentation). Example Google-style comment:

def my_function(arg1, arg2): """One-line description of function. Additional in-depth description goes in paragraphs here. Args: arg1: This arg does this. arg2: This arg does this. Returns: Always None. """ return None

Performance and Compatibility

Removing one-hot encodings. In some cases, replacing one-hot encodings with embedding layers might yield decent increases in performance.

Newest PyTorch. This would be nice so we could integrate new features from a rapidly evolving framework.

Python 3. Not really a high priority, but something to consider given that Python 2 will be deprecated in 2020.

Random Stuff

Model and controller become required arguments of task constructor.

Constructor kwargs to Model (Controller) should not be passed as kwargs to Controller (Network)

Should we move tasks/configs.py out of tasks package and instead make it top-level?

Cleaning up constructor arguments/default values. The task class hierarchies have default values for arguments at several layers. This can make it confusing where a value is being set. I propose that all values should only be specified at one layer. For most task parameters, this should probably be in the concrete class. If there is a set of parameters applying to all tasks, we can remove them from the concrete constructors and use an **args parameter in concrete to override them.

nit: Change Logger to use __enter__/__exit__ hooks instead of __del__. Or use built-in logging module.
opened by viking-sudo-rm 5
Different non-linearity in controller

Hi all, thank you for publishing your code. I have a question regarding the non-linearity in the output layer of the controller. In their paper, Grefenstette et. al. use sigmoid for the pop and push signals and tanh for the value and output vectors. In your code, you use sigmoids for all outputs. Is that on purpose? Did you experiment with tanh?

opened by johny-c 2
Struct optimization

Changed implementation of structs to use a list of Variables instead of a big Variable.

Stack and queue test cases pass. On the Linzen task, one epoch takes about 9 minutes with this new implementation.

In the future, we might want to make it so that entries with zero weight in each batch get deleted from the list. I didn't implement this because I wasn't sure it would actually be efficient (Having to check each item of the vector in each batch at every time step).

Fixes #29.

opened by viking-sudo-rm 1
Linzen task

Implemented Linzen agreement task.

The Linzen task now gets 96.7% accuracy after 20 epochs. I disabled our custom initialization in favor of the default initialization, and it increased performance. This could explain the odd LSTM behavior that we have observed more generally.

This is still not replicating the reported results, however. The other main architectural difference as far as I can tell is the softmax versus sigmoid loss function.

opened by viking-sudo-rm 1
Replace max_x_length/max_y_length with max_length

The Task class should just have a max_length parameter.

Right now, we have max_x_length and max_y_length at the bottom level, both of which are used in weird and seemingly inconsistent ways.

Most concrete tasks unify these values anyway. Would probably make sense to always make them unified.
refactor

opened by viking-sudo-rm 1

Pass task parameters as dictionary or object.

Currently, task parameters are passed as a bunch of named arguments. Example from base.py:

class Task(object):
    """
    Abstract class for creating experiments that train and evaluate a
    neural network model with a neural stack or queue.
    """
    __metaclass__ = ABCMeta

    def __init__(self,
                 batch_size=10,
                 clipping_norm=None,
                 criterion=nn.CrossEntropyLoss(),
                 cuda=False,
                 epochs=100,
                 early_stopping_steps=5,
                 hidden_size=10,
                 learning_rate=0.01,
                 load_path=None,
                 l2_weight=.01,
                 max_x_length=10,
                 max_y_length=10,
                 model_type=VanillaModel,
                 controller_type=LinearSimpleStructController,
                 null=u"#",
                 read_size=2,
                 reg_weight=1.,
                 save_path=None,
                 struct_type=Stack,
                 time_function=(lambda t: t),
                 verbose=True):

This should be replaced by a signature of the format:

class Task(object):
    """
    Abstract class for creating experiments that train and evaluate a
    neural network model with a neural stack or queue.
    """
    __metaclass__ = ABCMeta

    def __init__(self, params):

where params is a TaskParams object with configurable parameters. This makes the inheritance patterns between different types of tasks much more readable. It also makes sure that the default values for all the TaskParams are set in exactly one place, which makes debugging easier.

Note: TaskParams could extend dictionary, wrap a dictionary, or params could just be of type dictionary. There are several ways to go about this.

help wanted good first issue refactor

opened by viking-sudo-rm 1

(P)CFG-Based Language Models

This is to add the code we worked on with Dana for (P)CFG-based language models. The code is not yet complete.

Somehow I can't push to your repository, so I pushed to my fork and made this pull request.

cfg.py contains Dana's version of the code. parentheses.py is my version.

https://github.com/viking-sudo-rm/StackNN/compare/master...yidinghao:master?expand=1#diff-265097babe6c48f38dcedf6a873bd1fc

opened by yidinghao 1
Implemented ControlLayer for stack interface.
The control layer takes a basic vector as input and returns a push vector, pop strength, push strength, and read_strength.

The current form of the layer is very basic. The computation is as follows:

push vector: Linear transformation with tanh.

push strength: Linear transformation with sigmoid.

pop strength: Expected value of a softmax distribution produced from a linear transformation.

read strength: Expected value of a softmax distribution produced from a linear transformation.

We should probably test the layer implemented this way on Reverse, Linzen, etc. I can do some testing when I get back to school in a few days.

Also, there are some extensions worth trying, such as doing multi-pop/multi-read (instead of taking expected value), or allowing multiple read heads in parallel.
opened by viking-sudo-rm 0
Optimize core data structure logic.

Can store stack as a list of Variables instead of a Variable that needs to constantly be reallocated.

This will make pushing, popping, and reading O(1) instead of O(n), which should drastically improve performance.

opened by viking-sudo-rm 0
Varied configurations for train and test runs on CFG tasks
This PR accomplishes a few things:

Establishes a mechanism for varying parameters between training and test data for formal tasks (open to discussion about the best way to do this, or if there is already another mechanism for doing this)

Updates the Dyck task to respect test-specific configurations

Update the Dyck task configuration to follow the same parameters for testing as what was used in the paper

Adds support for using the LSTMVisualizer on the Dyck task

It seems to work (and get similar numbers to what was in the paper), but the current implementation slows down training substantially. I still need to dig into why that is (perhaps because max_x_length is now bigger in training?), but should probably wait to merge this until that's solved.
opened by LK 0
Try Linzen task with binary cross entropy.

The original Linzen paper used binary cross entropy instead of softmax cross entropy.

https://github.com/TalLinzen/rnn_agreement/blob/master/rnnagr/agreement_acceptor.py

Results are not replicating with softmax cross entropy. We can try switching to binary cross entropy.

opened by viking-sudo-rm 0
CUDA compatibility

I'm not quite sure if the StackNN code is fully CUDA compatible, but recently I had some issues trying to use it.

Should investigate this and add cuda options everywhere in structs/control_layer.

opened by viking-sudo-rm 1

Owner

Will Merrill

NLP x linguistics x theory w/ AllenNLP.

GitHub

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

68 Dec 29, 2022

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

6 Mar 17, 2022

Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

655 Jan 4, 2023

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

3 Dec 27, 2022

Differentiable Optimizers with Perturbations in Pytorch

Differentiable Optimizers with Perturbations in PyTorch This contains a PyTorch implementation of Differentiable Optimizers with Perturbations in Tens

54 Jun 22, 2022

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

76 Jan 2, 2023

Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

7.6k Jan 4, 2023

Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Parallel Tacotron2 Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

170 Dec 27, 2022

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Soft DTW Loss Function for PyTorch in CUDA This is a Pytorch Implementation of Soft-DTW: a Differentiable Loss Function for Time-Series which is batch

76 Dec 20, 2022

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom Sample on-line plotting while training(avg loss)/testing(writ

269 Nov 15, 2022

Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

91 Nov 21, 2022

an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

338 Dec 28, 2022

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

3 Oct 15, 2022

Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

0 Jan 23, 2022

Experiments with differentiable stacks and queues in PyTorch

Related tags

Overview

Please use stacknn-core instead!

StackNN

Running a demo

Documentation

Contributing

Citations

Dependencies

Models

Data structures

Tasks

String reversal

Context-free language modelling

Evaluation tasks

Real datasets

Comments

Standardizing Class Names with Paper Terminology

Documentation Style

Performance and Compatibility

Random Stuff

Owner

Will Merrill

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

Fast, differentiable sorting and ranking in PyTorch

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Differentiable Optimizers with Perturbations in Pytorch

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Open Source Differentiable Computer Vision Library for PyTorch

Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom

Pytorch implementation of DeepMind's differentiable neural computer paper.

an implementation of softmax splatting for differentiable forward warping using PyTorch

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

PyTorch Personal Trainer: My framework for deep learning experiments

PyTorch experiments with the Zalando fashion-mnist dataset

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

PyTorch code to run synthetic experiments.

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches