End-To-End Memory Network using Tensorflow

Overview

MemN2N

Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset.

MemN2N picture

Get Started

git clone [email protected]:domluna/memn2n.git

mkdir ./memn2n/data/
cd ./memn2n/data/
wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz
tar xzvf ./tasks_1-20_v1-2.tar.gz

cd ../
python single.py

Examples

Running a single bAbI task

Running a joint model on all bAbI tasks

These files are also a good example of usage.

Requirements

  • tensorflow 1.0
  • scikit-learn 0.17.1
  • six 1.10.0

Single Task Results

For a task to pass it has to meet 95%+ testing accuracy. Measured on single tasks on the 1k data.

Pass: 1,4,12,15,20

Several other tasks have 80%+ testing accuracy.

Stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

  • epochs: 100
  • hops: 3
  • embedding_size: 20
Task Training Accuracy Validation Accuracy Testing Accuracy
1 1.0 1.0 1.0
2 1.0 0.86 0.83
3 1.0 0.64 0.54
4 1.0 0.99 0.98
5 1.0 0.94 0.87
6 1.0 0.97 0.92
7 1.0 0.89 0.84
8 1.0 0.93 0.86
9 1.0 0.86 0.90
10 1.0 0.80 0.78
11 1.0 0.92 0.84
12 1.0 1.0 1.0
13 0.99 0.94 0.90
14 1.0 0.97 0.93
15 1.0 1.0 1.0
16 0.81 0.47 0.44
17 0.76 0.65 0.52
18 0.97 0.96 0.88
19 0.40 0.17 0.13
20 1.0 1.0 1.0

Joint Training Results

Pass: 1,6,9,10,12,13,15,20

Again stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

  • epochs: 60
  • hops: 3
  • embedding_size: 40
Task Training Accuracy Validation Accuracy Testing Accuracy
1 1.0 0.99 0.999
2 1.0 0.84 0.849
3 0.99 0.72 0.715
4 0.96 0.86 0.851
5 1.0 0.92 0.865
6 1.0 0.97 0.964
7 0.96 0.87 0.851
8 0.99 0.89 0.898
9 0.99 0.96 0.96
10 1.0 0.96 0.928
11 1.0 0.98 0.93
12 1.0 0.98 0.982
13 0.99 0.98 0.976
14 1.0 0.81 0.877
15 1.0 1.0 0.983
16 0.64 0.45 0.44
17 0.77 0.64 0.547
18 0.85 0.71 0.586
19 0.24 0.07 0.104
20 1.0 1.0 0.996

Notes

Single task results are from 10 repeated trails of the single task model accross all 20 tasks with different random initializations. The performance of the model with the lowest validation accuracy for each task is shown in the table above.

Joint training results are from 10 repeated trails of the joint model accross all tasks. The performance of the single model whose validation accuracy passed the most tasks (>= 0.95) is shown in the table above (joint_scores_run2.csv). The scores from all 10 runs are located in the results/ directory.

Comments
  • Position Encoding

    Position Encoding

    Hi domluna, How did you get the equation in position_encoding? It seems different from the one in the paper, unless I made a silly algebra mistake... Even then, is there an advantage in splitting out the equation into the way you wrote it? Some sort of optimization?

    opened by andrewjylee 4
  • Changes to match PE basline from paper

    Changes to match PE basline from paper

    • Implement adjacent weight sharing of A & C
    • Fix temporal encoding (performance on tasks 2 & 3 now comparable with paper)
    • Fix position encoding
    • Switch optimizer to SGD with annealed learning rate
    • Update README results
    opened by akandykeller 2
  • Compare Results

    Compare Results

    Hello @domluna! Thanks for your nice scripts. I have one question about this model. Do you know why some task results here are very different from the Facebook matlab one? (like task11,13,16) Is it because the initialization of model? https://github.com/vinhkhuc/MemN2N-babi-python/tree/master/bechmarks Thank you for your respond :)

    opened by jasonwu0731 1
  • running joint.py throws an error

    running joint.py throws an error

    Traceback (most recent call last): File "joint.py", line 121, in for start in range(0, n_train, n_train/20): TypeError: 'float' object cannot be interpreted as an integer

    This shows up after a few runs. single.py runs fine. Any idea why this could happen?

    The full log is:

    (mem-tf) skc@Ultron:~/Projects/qa-mem/tf-memn2n$ python joint.py Started Joint Model /Users/skc/anaconda/envs/mem-tf/lib/python3.5/re.py:203: FutureWarning: split() requires a non-empty pattern match. return _compile(pattern, flags).split(string, maxsplit) Longest sentence length 11 Longest story length 228 Average story length 9 Training Size 18000 Validation Size 2000 Testing Size 20000 (18000, 50, 11) (2000, 50, 11) (20000, 50, 11) (18000, 11) (2000, 11) (20000, 11) (18000, 175) (2000, 175) (20000, 175) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) Traceback (most recent call last): File "joint.py", line 121, in for start in range(0, n_train, n_train/20): TypeError: 'float' object cannot be interpreted as an integer

    opened by skckompella 1
  • Add tutorial to download data before running

    Add tutorial to download data before running

    I follow the instruction to run single.py but it fails. It would be better to add tutorial about how and where to download the data.

    $ python ./single.py
    Started Task: 1
    Traceback (most recent call last):
      File "./single.py", line 32, in <module>
        train, test = load_task(FLAGS.data_dir, FLAGS.task_id)
      File "/home/tobe/code/memn2n/data_utils.py", line 14, in load_task
        files = os.listdir(data_dir)
    OSError: [Errno 2] No such file or directory: 'data/tasks_1-20_v1-2/en/'
    
    opened by tobegit3hub 1
  • fix 0 logits in input module

    fix 0 logits in input module

    Currently, because the nil embedding is 0 (which is fine) and which we pad to a specified memory size, we tend to have a bunch of memories which are empty [0 0 ... 0]. The problem with this is we feed this into a softmax as is and exp(0) = 1. On the output the empty memories have a uniform probability. This is problematic because it alters the probabilities of non-empty memories.

    So the solution is to add a largish negative number to empty memories before sotfmax is applied. Then the exp() of the value will be 0 or close enough.

    This issue is particularly evident in task 4 where each story consists of 2 sentences. If we make the memory size large, say 50 (only 2 is needed) 2 things tend to occur:

    1. We converge at a much slower rate
    2. We get a worse error rate

    An alternative solution would be make all batch-size 1 (at least at a low level, higher level API can make this nicer). This way the memory can be of any size since nothing in the underlying algorithm relies on the memory being a fixed size (at least I think this is the case, have to double check!).

    opened by domluna 1
  • Difference between code and paper

    Difference between code and paper

    Hi, Thank you for your codes! It is very helpful.

    I noticed a difference between your code and original paper. The paper uses embedding to get c for each story, and directly add o and u to get the input of the predict layer or the u for next layer in the case of multi-hop. In your code, c is the given the same value as m rather than otherwise recalculated. And o is dot producted with a matrix you called H before adding up with u. I am wondering why you do it this way? I haven't tested the difference. Will it influence the performance?

    opened by ZijiaLewisLu 0
  • tokenize function code in data_utils.py is incorrect

    tokenize function code in data_utils.py is incorrect

    with the test intention that

    >>> tokenize('Bob dropped the apple. Where is the apple?')
        ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']
    

    we should write like this:

    def tokenize(sent):
        return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]
    
    opened by zpengc 0
  • Support for Ragged/Jagged arrays

    Support for Ragged/Jagged arrays

    On this line, it is mentioned there is not support for jagged arrays, but the new Tensorflow v2.1.0 has introduced RaggedTensor. It would be nice if support for this feature can be provided in the current codebase.

    opened by jaibhageria 0
  • Puzzled about the attention part

    Puzzled about the attention part

    m_C = tf.reduce_sum(m_emb_C * self._encoding, 2) c_temp = tf.transpose(m_C, [0, 2, 1])

    Here in this part, the first line with reduce_sum should turn the matrix into 2-dimension, so I think it won't work for the transposition in the second line. I am not sure if I am getting something wrong

    opened by JustinLin610 1
  • Found joint.py errors

    Found joint.py errors

    n_train/20, n_val/20, and n_test/20 cause errors in python3.

    I modified n_train/20 -> n_train//20 n_val/20 -> n_val//20 n_test/20 -> n_test//20 and it works

    opened by donghyeonk 1
Owner
Dominique Luna
magnificent stallion
Dominique Luna
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

TheSys Group @ CMU CS 78 Jan 7, 2023
Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

null 3 Feb 18, 2022
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

Phil Wang 180 Jan 5, 2023
End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

null 472 Dec 22, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

End-to-end Music Remastering System This repository includes source code and pre

Junghyun (Tony) Koo 37 Dec 15, 2022
Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

Haozhe Xie 76 Dec 14, 2022
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 72 Dec 14, 2022
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 26 Sep 26, 2021
Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

KaGRMN-DSG_ABSA This repository contains the PyTorch source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated

XingBowen 4 May 20, 2022
Space Time Recurrent Memory Network - Pytorch

Space Time Recurrent Memory Network - Pytorch (wip) Implementation of Space Time Recurrent Memory Network, recurrent network competitive with attentio

Phil Wang 50 Nov 7, 2021
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

null 34 May 24, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 7, 2023
PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

Princeton Natural Language Processing 657 Jan 9, 2023
An end-to-end PyTorch framework for image and video classification

What's New: March 2021: Added RegNetZ models November 2020: Vision Transformers now available, with training recipes! 2020-11-20: Classy Vision v0.5 R

Facebook Research 1.5k Dec 31, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

null 695 Jan 5, 2023