Neural Turing Machines (NTM) - PyTorch Implementation

Overview

PyTorch Neural Turing Machine (NTM)

PyTorch implementation of Neural Turing Machines (NTM).

An NTM is a memory augumented neural network (attached to external memory) where the interactions with the external memory (address, read, write) are done using differentiable transformations. Overall, the network is end-to-end differentiable and thus trainable by a gradient based optimizer.

The NTM is processing input in sequences, much like an LSTM, but with additional benfits: (1) The external memory allows the network to learn algorithmic tasks easier (2) Having larger capacity, without increasing the network's trainable parameters.

The external memory allows the NTM to learn algorithmic tasks, that are much harder for LSTM to learn, and to maintain an internal state much longer than traditional LSTMs.

A PyTorch Implementation

This repository implements a vanilla NTM in a straight forward way. The following architecture is used:

NTM Architecture

Features

  • Batch learning support
  • Numerically stable
  • Flexible head configuration - use X read heads and Y write heads and specify the order of operation
  • copy and repeat-copy experiments agree with the paper

Copy Task

The Copy task tests the NTM's ability to store and recall a long sequence of arbitrary information. The input to the network is a random sequence of bits, ending with a delimiter. The sequence lengths are randomised between 1 to 20.

Training

Training convergence for the copy task using 4 different seeds (see the notebook for details)

NTM Convergence

The following plot shows the cost per sequence length during training. The network was trained with seed=10 and shows fast convergence. Other seeds may not perform as well but should converge in less than 30K iterations.

NTM Convergence

Evaluation

Here is an animated GIF that shows how the model generalize. The model was evaluated after every 500 training samples, using the target sequence shown in the upper part of the image. The bottom part shows the network output at any given training stage.

Copy Task

The following is the same, but with sequence length = 80. Note that the network was trained with sequences of lengths 1 to 20.

Copy Task


Repeat Copy Task

The Repeat Copy task tests whether the NTM can learn a simple nested function, and invoke it by learning to execute a for loop. The input to the network is a random sequence of bits, followed by a delimiter and a scalar value that represents the number of repetitions to output. The number of repetitions, was normalized to have zero mean and variance of one (as in the paper). Both the length of the sequence and the number of repetitions are randomised between 1 to 10.

Training

Training convergence for the repeat-copy task using 4 different seeds (see the notebook for details)

NTM Convergence

Evaluation

The following image shows the input presented to the network, a sequence of bits + delimiter + num-reps scalar. Specifically the sequence length here is eight and the number of repetitions is five.

Repeat Copy Task

And here's the output the network had predicted:

Repeat Copy Task

Here's an animated GIF that shows how the network learns to predict the targets. Specifically, the network was evaluated in each checkpoint saved during training with the same input sequence.

Repeat Copy Task

Installation

The NTM can be used as a reusable module, currently not packaged though.

  1. Clone repository
  2. Install PyTorch
  3. pip install -r requirements.txt

Usage

Execute ./train.py

usage: train.py [-h] [--seed SEED] [--task {copy,repeat-copy}] [-p PARAM]
                [--checkpoint-interval CHECKPOINT_INTERVAL]
                [--checkpoint-path CHECKPOINT_PATH]
                [--report-interval REPORT_INTERVAL]

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           Seed value for RNGs
  --task {copy,repeat-copy}
                        Choose the task to train (default: copy)
  -p PARAM, --param PARAM
                        Override model params. Example: "-pbatch_size=4
                        -pnum_heads=2"
  --checkpoint-interval CHECKPOINT_INTERVAL
                        Checkpoint interval (default: 1000). Use 0 to disable
                        checkpointing
  --checkpoint-path CHECKPOINT_PATH
                        Path for saving checkpoint data (default: './')
  --report-interval REPORT_INTERVAL
                        Reporting interval
Comments
  • Strange fluctuation on curves even after large #seqs have been trained with

    Strange fluctuation on curves even after large #seqs have been trained with

    7711528080263_ pic_hd 7701528080263_ pic_hd

    (random seed=10) As the plots show, after 120,000 seqs, there still occurs some fluctuation of cost, which seems not to match that of the results in your experiments and the original authors'. What could probably be the reasons? How to copy with this? THANKS A LOT.

    opened by marcwww 6
  • Removed the unnecessary softplus in NTMHeadBase._address_memory

    Removed the unnecessary softplus in NTMHeadBase._address_memory

    Removed the softplus in the softmax:

            s = F.softmax(F.softplus(s), dim=1)
    

    softmax already constrains the values to (0, 1), the softplus doesn't achieve anything. Pytorch's softmax implementation is already numerically stable, so that's not the preoccupation.

    opened by JulesGM 4
  • Different results between testing in the mid terms of training and at the end of training

    Different results between testing in the mid terms of training and at the end of training

    Dear author,

    I've fork a repo(at https://github.com/marcwww/pytorch-ntm) from your work, mainly expecting to test the model on longer sequences(for example, training on sequences of length ranging from 1 to 10, and testing on seqs of length ranging from 11 to 20).

    The question is that the final testing result after training without testing in the middle terms of the training process is different from that with testing in the middle terms. The experiment setting of the repo is the latter one (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/train_test.py#L236).

    In the forked repo batches for testing are sampled in the same way of ones for training (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/tasks/copytask_test.py#L16). Actually, I've tried to see whether the result are from the intertwined sampling of training and testing by loading a pre-generated test set, and it does not help.

    Could you please help me with this? Thanks a lot.

    opened by marcwww 3
  • Various PyTorch 0.4.0 updates

    Various PyTorch 0.4.0 updates

    This commit fixes a few deprecation warnings thrown by PyTorch 0.4.0.

    Merge of Variable and Tensor:

    • change Variable(a_tensor) to a_tensor

    Support for PyTorch Scalars of 0-dimension

    • changeloss.data[0] to loss.item()

    Naming conventions in nn.init

    • change nn.init.constant to nn.init.constant_
    • change nn.init.uniform to nn.init.uniform_
    • change nn.init.xavier_uniform to nn.init.xavier_uniform_

    This pull request does not fix TypeError: Object of type 'Tensor' is not JSON serializable thrown by save_checkpoint on line 83 of train.py, which seems to be an issue new to PyTorch 0.4.0.

    Thanks! This is a great implementation!!! Mark

    opened by marikgoldstein 3
  • Convergence is really slow with copy task when sequence length is smaller

    Convergence is really slow with copy task when sequence length is smaller

    Hi,

    I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

    figure_1

    opened by wangshaonan 2
  • Why to create a new tensor?

    Why to create a new tensor?

    Hi, dear author:

        def write(self, w, e, a):
            """write to memory (according to section 3.2)."""
            self.prev_mem = self.memory
            self.memory = Variable(torch.Tensor(self.batch_size, self.N, self.M))
            erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
            add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
            self.memory = self.prev_mem * (1 - erase) + add
    

    In your writing method, I dont understand why u create a new Variable(torch.Tensor(self.batch_size, self.N, self.M)) and then assign the new value, Why not write as following directly :

        def write(self, w, e, a):
            """write to memory (according to section 3.2).""" 
            erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
            add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
            self.memory = self.memory * (1 - erase) + add
    
    opened by dragen1860 2
  • How to change code when each sequence length is different

    How to change code when each sequence length is different

    Dear sir: Sorry for my rude words.But I do really want to know how to change the code if the sequence length is different.It seems that ntm need sequence length to be the same. I tried to fill short sequence with zero but the ntm use attention and if filled with zero the predict rate down. I want to know how could I do to overcome the problem.Please help me. Best wishes to you.

    opened by REDXXXX 1
  • What's the meaning of  using memory?

    What's the meaning of using memory?

    Dear sir: I have read your code and I really appreciate your work.But I have get some questions.

    1. self.register_buffer('mem_bias',torch.Tensor(N,M)) #mem_bias was used as buffer, which means it would not be update
    2. self.memory =self.mem_bias.clone().repeat(batch_size,1,1) # self.memory was create using mem_bias to match the batch_size
    3. for each batch, we run init_squence(), image which reset the memory,and the reset function , self.batch_size = batch_size self.memory = self.mem_bias.clone().repeat(batch_size, 1, 1)

    just clean all the content in the memory and initialize it with mem_bias.
    So what's the point to write and read from memory ? it just be the same with mem_bias each batch and mem_bias is not updated which means it's never changed. I think I just could not figure it out and I would readlly appreciate it if you could answer my question.

    opened by REDXXXX 1
  • why need to initialize read vector and memory?

    why need to initialize read vector and memory?

    Dear author: I found you initialize read vector and memory as :

    		self.register_buffer('mem_bias', torch.Tensor(N, M))
    
    		# Initialize memory bias
    		stdev = 1 / (np.sqrt(N + M))
    		nn.init.uniform_(self.mem_bias, -stdev, stdev)
    

    and

    init_r_bias = torch.randn(1, M).to('cuda') * 0.01
    # the initial value of read vector is not optimized.
    self.register_buffer("read{}_bias".format(self.num_read_heads), init_r_bias)
    

    I wonder whether the initialization scheme will make a big difference, or I can just all initialized to torch.zeros()??

    opened by dragen1860 1
  • How to visual the training process and draw these animation?

    How to visual the training process and draw these animation?

    Dear author, I found your readme is very easy to follow and the animation of training process is vivid. I wonder how to draw these pictures and animation?

    opened by dragen1860 1
  • Corrected the width of the circular convolution adjustment

    Corrected the width of the circular convolution adjustment

    The code concats 2 elements to each side but only needs to concat 1 to each side.

    Tested with the following code

    import torch
    import torch.nn
    from torch.nn import functional as F
    from torch.autograd import Variable
    from random import randint
    
    def _convolve_original(w, s):
        """Circular convolution implementation."""
        assert s.size(0) == 3
        t = torch.cat([w[-2:], w, w[:2]])
        c = F.conv1d(t.view(1, 1, -1), s.view(1, 1, -1)).view(-1)
        return c[1:-1]
    
    
    def _convolve_new(w, s):
        """Circular convolution implementation."""
        assert s.size(0) == 3
        t = torch.cat([w[-1:], w, w[:1]])
        c = F.conv1d(t.view(1, 1, -1), s.view(1, 1, -1)).view(-1)
        return c
    
    
    for i in range(10000):
        N = randint(10, 1000)
        w = Variable(torch.zeros([N]))
        torch.nn.init.uniform(w)
    
        s = Variable(torch.zeros([3]))
        torch.nn.init.uniform(s)
    
    
        assert (_convolve_original(w, s) == _convolve_new(w, s)).all()
    
    
    opened by JulesGM 1
  • error in copy-task-plots.ipynb

    error in copy-task-plots.ipynb

    when run the code piece

    seq_len = 60
    _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
    result = evaluate(model.net, model.criterion, x, y)
    y_out = result['y_out']
    

    there comes the error information:

    IndexError                                Traceback (most recent call last)
    <ipython-input-41-127bd44fb490> in <module>()
          1 seq_len = 60
          2 _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
    ----> 3 result = evaluate(model.net, model.criterion, x, y)
          4 y_out = result['y_out']
    
    D:\GithubProjs\pytorch-ntm-master\train.py in evaluate(net, criterion, X, Y)
        151 
        152     result = {
    --> 153         'loss': loss.data[0],
        154         'cost': cost / batch_size,
        155         'y_out': y_out,
    
    IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
    

    how to solve it?

    opened by bw-xu 2
  • Why use vague storage

    Why use vague storage

    Dear sir: Viewing the code,for each input batch, B C,it would be stored as B N M.All the N weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?

    opened by REDXXXX 0
Owner
Guy Zana
I make things, author of Curated Papers
Guy Zana
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 2.8k Feb 12, 2021
PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Neuro-Symbolic Sudoku Solver PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM). Please n

Ashutosh Hathidara 60 Dec 10, 2022
Restricted Boltzmann Machines in Python.

How to Use First, initialize an RBM with the desired number of visible and hidden units. rbm = RBM(num_visible = 6, num_hidden = 2) Next, train the m

Edwin Chen 928 Dec 30, 2022
Probabilistic Gradient Boosting Machines

PGBM Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Air

Olivier Sprangers 112 Dec 28, 2022
Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

Nada Amin 6 Feb 2, 2022
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Neural Attention Distillation This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep

Yige-Li 84 Jan 4, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

null 770 Jan 2, 2023
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind

Benedek Rozemberczki 49 Dec 1, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication. This

Phil Wang 259 Jan 4, 2023
An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

GLOM An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" for MNIST Dataset. To understand this

null 50 Oct 19, 2022
Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

ArtFlow Official PyTorch implementation of the paper: ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows Jie An*, Siyu Huang*, Yibing

null 123 Dec 27, 2022