Neural Turing Machines (NTM) - PyTorch Implementation

Guy Zana

Last update: Dec 21, 2022

Related tags

Deep Learning python neural-network notebook pytorch lstm neural-turing-machines ntm

Overview

PyTorch Neural Turing Machine (NTM)

PyTorch implementation of Neural Turing Machines (NTM).

An NTM is a memory augumented neural network (attached to external memory) where the interactions with the external memory (address, read, write) are done using differentiable transformations. Overall, the network is end-to-end differentiable and thus trainable by a gradient based optimizer.

The NTM is processing input in sequences, much like an LSTM, but with additional benfits: (1) The external memory allows the network to learn algorithmic tasks easier (2) Having larger capacity, without increasing the network's trainable parameters.

The external memory allows the NTM to learn algorithmic tasks, that are much harder for LSTM to learn, and to maintain an internal state much longer than traditional LSTMs.

A PyTorch Implementation

This repository implements a vanilla NTM in a straight forward way. The following architecture is used:

Features

Batch learning support
Numerically stable
Flexible head configuration - use X read heads and Y write heads and specify the order of operation
copy and repeat-copy experiments agree with the paper

Copy Task

The Copy task tests the NTM's ability to store and recall a long sequence of arbitrary information. The input to the network is a random sequence of bits, ending with a delimiter. The sequence lengths are randomised between 1 to 20.

Training

Training convergence for the copy task using 4 different seeds (see the notebook for details)

The following plot shows the cost per sequence length during training. The network was trained with seed=10 and shows fast convergence. Other seeds may not perform as well but should converge in less than 30K iterations.

Evaluation

Here is an animated GIF that shows how the model generalize. The model was evaluated after every 500 training samples, using the target sequence shown in the upper part of the image. The bottom part shows the network output at any given training stage.

The following is the same, but with sequence length = 80. Note that the network was trained with sequences of lengths 1 to 20.

Repeat Copy Task

The Repeat Copy task tests whether the NTM can learn a simple nested function, and invoke it by learning to execute a for loop. The input to the network is a random sequence of bits, followed by a delimiter and a scalar value that represents the number of repetitions to output. The number of repetitions, was normalized to have zero mean and variance of one (as in the paper). Both the length of the sequence and the number of repetitions are randomised between 1 to 10.

Training

Training convergence for the repeat-copy task using 4 different seeds (see the notebook for details)

Evaluation

The following image shows the input presented to the network, a sequence of bits + delimiter + num-reps scalar. Specifically the sequence length here is eight and the number of repetitions is five.

And here's the output the network had predicted:

Here's an animated GIF that shows how the network learns to predict the targets. Specifically, the network was evaluated in each checkpoint saved during training with the same input sequence.

Installation

The NTM can be used as a reusable module, currently not packaged though.

Clone repository
Install PyTorch
pip install -r requirements.txt

Usage

Execute ./train.py

usage: train.py [-h] [--seed SEED] [--task {copy,repeat-copy}] [-p PARAM]
                [--checkpoint-interval CHECKPOINT_INTERVAL]
                [--checkpoint-path CHECKPOINT_PATH]
                [--report-interval REPORT_INTERVAL]

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           Seed value for RNGs
  --task {copy,repeat-copy}
                        Choose the task to train (default: copy)
  -p PARAM, --param PARAM
                        Override model params. Example: "-pbatch_size=4
                        -pnum_heads=2"
  --checkpoint-interval CHECKPOINT_INTERVAL
                        Checkpoint interval (default: 1000). Use 0 to disable
                        checkpointing
  --checkpoint-path CHECKPOINT_PATH
                        Path for saving checkpoint data (default: './')
  --report-interval REPORT_INTERVAL
                        Reporting interval

Comments

Strange fluctuation on curves even after large #seqs have been trained with

(random seed=10) As the plots show, after 120,000 seqs, there still occurs some fluctuation of cost, which seems not to match that of the results in your experiments and the original authors'. What could probably be the reasons? How to copy with this? THANKS A LOT.

opened by marcwww 6
Removed the unnecessary softplus in NTMHeadBase._address_memory
Removed the softplus in the softmax:

s = F.softmax(F.softplus(s), dim=1)

softmax already constrains the values to (0, 1), the softplus doesn't achieve anything. Pytorch's softmax implementation is already numerically stable, so that's not the preoccupation.
opened by JulesGM 4
Different results between testing in the mid terms of training and at the end of training

Dear author,

I've fork a repo(at https://github.com/marcwww/pytorch-ntm) from your work, mainly expecting to test the model on longer sequences(for example, training on sequences of length ranging from 1 to 10, and testing on seqs of length ranging from 11 to 20).

The question is that the final testing result after training without testing in the middle terms of the training process is different from that with testing in the middle terms. The experiment setting of the repo is the latter one (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/train_test.py#L236).

In the forked repo batches for testing are sampled in the same way of ones for training (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/tasks/copytask_test.py#L16). Actually, I've tried to see whether the result are from the intertwined sampling of training and testing by loading a pre-generated test set, and it does not help.

Could you please help me with this? Thanks a lot.

opened by marcwww 3
Various PyTorch 0.4.0 updates
This commit fixes a few deprecation warnings thrown by PyTorch 0.4.0.

Merge of Variable and Tensor:

change Variable(a_tensor) to a_tensor

Support for PyTorch Scalars of 0-dimension

changeloss.data[0] to loss.item()

Naming conventions in nn.init

change nn.init.constant to nn.init.constant_

change nn.init.uniform to nn.init.uniform_

change nn.init.xavier_uniform to nn.init.xavier_uniform_

This pull request does not fix TypeError: Object of type 'Tensor' is not JSON serializable thrown by save_checkpoint on line 83 of train.py, which seems to be an issue new to PyTorch 0.4.0.

Thanks! This is a great implementation!!! Mark
opened by marikgoldstein 3
Convergence is really slow with copy task when sequence length is smaller

Hi,

I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

opened by wangshaonan 2

Why to create a new tensor?

Hi, dear author:

    def write(self, w, e, a):
        """write to memory (according to section 3.2)."""
        self.prev_mem = self.memory
        self.memory = Variable(torch.Tensor(self.batch_size, self.N, self.M))
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.prev_mem * (1 - erase) + add

In your writing method, I dont understand why u create a new Variable(torch.Tensor(self.batch_size, self.N, self.M)) and then assign the new value, Why not write as following directly :

    def write(self, w, e, a):
        """write to memory (according to section 3.2).""" 
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.memory * (1 - erase) + add

opened by dragen1860 2

How to change code when each sequence length is different

Dear sir: Sorry for my rude words.But I do really want to know how to change the code if the sequence length is different.It seems that ntm need sequence length to be the same. I tried to fill short sequence with zero but the ntm use attention and if filled with zero the predict rate down. I want to know how could I do to overcome the problem.Please help me. Best wishes to you.

opened by REDXXXX 1
What's the meaning of using memory?
Dear sir: I have read your code and I really appreciate your work.But I have get some questions.

self.register_buffer('mem_bias',torch.Tensor(N,M)) #mem_bias was used as buffer, which means it would not be update

self.memory =self.mem_bias.clone().repeat(batch_size,1,1) # self.memory was create using mem_bias to match the batch_size

for each batch, we run init_squence(), which reset the memory,and the reset function , self.batch_size = batch_size self.memory = self.mem_bias.clone().repeat(batch_size, 1, 1)

just clean all the content in the memory and initialize it with mem_bias.
So what's the point to write and read from memory ? it just be the same with mem_bias each batch and mem_bias is not updated which means it's never changed. I think I just could not figure it out and I would readlly appreciate it if you could answer my question.
opened by REDXXXX 1

why need to initialize read vector and memory?

Dear author: I found you initialize read vector and memory as :

		self.register_buffer('mem_bias', torch.Tensor(N, M))

		# Initialize memory bias
		stdev = 1 / (np.sqrt(N + M))
		nn.init.uniform_(self.mem_bias, -stdev, stdev)

and

init_r_bias = torch.randn(1, M).to('cuda') * 0.01
# the initial value of read vector is not optimized.
self.register_buffer("read{}_bias".format(self.num_read_heads), init_r_bias)

I wonder whether the initialization scheme will make a big difference, or I can just all initialized to torch.zeros()??

opened by dragen1860 1

How to visual the training process and draw these animation?

Dear author, I found your readme is very easy to follow and the animation of training process is vivid. I wonder how to draw these pictures and animation?

opened by dragen1860 1

Corrected the width of the circular convolution adjustment

The code concats 2 elements to each side but only needs to concat 1 to each side.

Tested with the following code

import torch
import torch.nn
from torch.nn import functional as F
from torch.autograd import Variable
from random import randint

def _convolve_original(w, s):
    """Circular convolution implementation."""
    assert s.size(0) == 3
    t = torch.cat([w[-2:], w, w[:2]])
    c = F.conv1d(t.view(1, 1, -1), s.view(1, 1, -1)).view(-1)
    return c[1:-1]


def _convolve_new(w, s):
    """Circular convolution implementation."""
    assert s.size(0) == 3
    t = torch.cat([w[-1:], w, w[:1]])
    c = F.conv1d(t.view(1, 1, -1), s.view(1, 1, -1)).view(-1)
    return c


for i in range(10000):
    N = randint(10, 1000)
    w = Variable(torch.zeros([N]))
    torch.nn.init.uniform(w)

    s = Variable(torch.zeros([3]))
    torch.nn.init.uniform(s)


    assert (_convolve_original(w, s) == _convolve_new(w, s)).all()

opened by JulesGM 1

error in copy-task-plots.ipynb

when run the code piece

seq_len = 60
_, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
result = evaluate(model.net, model.criterion, x, y)
y_out = result['y_out']

there comes the error information:

IndexError                                Traceback (most recent call last)
<ipython-input-41-127bd44fb490> in <module>()
      1 seq_len = 60
      2 _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
----> 3 result = evaluate(model.net, model.criterion, x, y)
      4 y_out = result['y_out']

D:\GithubProjs\pytorch-ntm-master\train.py in evaluate(net, criterion, X, Y)
    151 
    152     result = {
--> 153         'loss': loss.data[0],
    154         'cost': cost / batch_size,
    155         'y_out': y_out,

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

how to solve it?

opened by bw-xu 2

Why use vague storage

Dear sir: Viewing the code,for each input batch, B C,it would be stored as B N M.All the N weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?

opened by REDXXXX 0

Owner

Guy Zana

I make things, author of Curated Papers

GitHub

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

3k Jan 3, 2023

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

2.8k Feb 12, 2021

PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Neuro-Symbolic Sudoku Solver PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM). Please n

60 Dec 10, 2022

Restricted Boltzmann Machines in Python.

How to Use First, initialize an RBM with the desired number of visible and hidden units. rbm = RBM(num_visible = 6, num_hidden = 2) Next, train the m

928 Dec 30, 2022

Probabilistic Gradient Boosting Machines

PGBM Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Air

112 Dec 28, 2022

Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

6 Feb 2, 2022

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

892 Dec 28, 2022

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Neural Attention Distillation This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep

84 Jan 4, 2023

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

616 Jan 6, 2023

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

210 Jan 4, 2023

Neural Turing Machines (NTM) - PyTorch Implementation

Related tags

Overview

PyTorch Neural Turing Machine (NTM)

A PyTorch Implementation

Features

Copy Task

Training

Evaluation

Repeat Copy Task

Training

Evaluation

Installation

Usage

Comments

Owner

Guy Zana

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Restricted Boltzmann Machines in Python.

Probabilistic Gradient Boosting Machines

Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

ALBERT-pytorch-implementation - ALBERT pytorch implementation

PyTorch implementation of neural style transfer algorithm

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"