End-To-End Memory Network using Tensorflow

Dominique Luna

Last update: Oct 27, 2022

Related tags

Overview

MemN2N

Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset.

Get Started

git clone [email protected]:domluna/memn2n.git

mkdir ./memn2n/data/
cd ./memn2n/data/
wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz
tar xzvf ./tasks_1-20_v1-2.tar.gz

cd ../
python single.py

Examples

Running a single bAbI task

Running a joint model on all bAbI tasks

These files are also a good example of usage.

Requirements

tensorflow 1.0
scikit-learn 0.17.1
six 1.10.0

Single Task Results

For a task to pass it has to meet 95%+ testing accuracy. Measured on single tasks on the 1k data.

Pass: 1,4,12,15,20

Several other tasks have 80%+ testing accuracy.

Stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

epochs: 100
hops: 3
embedding_size: 20

Task	Training Accuracy	Validation Accuracy	Testing Accuracy
1	1.0	1.0	1.0
2	1.0	0.86	0.83
3	1.0	0.64	0.54
4	1.0	0.99	0.98
5	1.0	0.94	0.87
6	1.0	0.97	0.92
7	1.0	0.89	0.84
8	1.0	0.93	0.86
9	1.0	0.86	0.90
10	1.0	0.80	0.78
11	1.0	0.92	0.84
12	1.0	1.0	1.0
13	0.99	0.94	0.90
14	1.0	0.97	0.93
15	1.0	1.0	1.0
16	0.81	0.47	0.44
17	0.76	0.65	0.52
18	0.97	0.96	0.88
19	0.40	0.17	0.13
20	1.0	1.0	1.0

Joint Training Results

Pass: 1,6,9,10,12,13,15,20

Again stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

epochs: 60
hops: 3
embedding_size: 40

Task	Training Accuracy	Validation Accuracy	Testing Accuracy
1	1.0	0.99	0.999
2	1.0	0.84	0.849
3	0.99	0.72	0.715
4	0.96	0.86	0.851
5	1.0	0.92	0.865
6	1.0	0.97	0.964
7	0.96	0.87	0.851
8	0.99	0.89	0.898
9	0.99	0.96	0.96
10	1.0	0.96	0.928
11	1.0	0.98	0.93
12	1.0	0.98	0.982
13	0.99	0.98	0.976
14	1.0	0.81	0.877
15	1.0	1.0	0.983
16	0.64	0.45	0.44
17	0.77	0.64	0.547
18	0.85	0.71	0.586
19	0.24	0.07	0.104
20	1.0	1.0	0.996

Notes

Single task results are from 10 repeated trails of the single task model accross all 20 tasks with different random initializations. The performance of the model with the lowest validation accuracy for each task is shown in the table above.

Joint training results are from 10 repeated trails of the joint model accross all tasks. The performance of the single model whose validation accuracy passed the most tasks (>= 0.95) is shown in the table above (joint_scores_run2.csv). The scores from all 10 runs are located in the results/ directory.

Comments

Position Encoding

Hi domluna, How did you get the equation in position_encoding? It seems different from the one in the paper, unless I made a silly algebra mistake... Even then, is there an advantage in splitting out the equation into the way you wrote it? Some sort of optimization?

opened by andrewjylee 4
Changes to match PE basline from paper
Implement adjacent weight sharing of A & C

Fix temporal encoding (performance on tasks 2 & 3 now comparable with paper)

Fix position encoding

Switch optimizer to SGD with annealed learning rate

Update README results
opened by akandykeller 2
Compare Results

Hello @domluna! Thanks for your nice scripts. I have one question about this model. Do you know why some task results here are very different from the Facebook matlab one? (like task11,13,16) Is it because the initialization of model? https://github.com/vinhkhuc/MemN2N-babi-python/tree/master/bechmarks Thank you for your respond :)

opened by jasonwu0731 1
running joint.py throws an error

Traceback (most recent call last): File "joint.py", line 121, in for start in range(0, n_train, n_train/20): TypeError: 'float' object cannot be interpreted as an integer

This shows up after a few runs. single.py runs fine. Any idea why this could happen?

The full log is:

(mem-tf) skc@Ultron:~/Projects/qa-mem/tf-memn2n$ python joint.py Started Joint Model /Users/skc/anaconda/envs/mem-tf/lib/python3.5/re.py:203: FutureWarning: split() requires a non-empty pattern match. return _compile(pattern, flags).split(string, maxsplit) Longest sentence length 11 Longest story length 228 Average story length 9 Training Size 18000 Validation Size 2000 Testing Size 20000 (18000, 50, 11) (2000, 50, 11) (20000, 50, 11) (18000, 11) (2000, 11) (20000, 11) (18000, 175) (2000, 175) (20000, 175) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) Traceback (most recent call last): File "joint.py", line 121, in for start in range(0, n_train, n_train/20): TypeError: 'float' object cannot be interpreted as an integer

opened by skckompella 1

Add tutorial to download data before running

I follow the instruction to run single.py but it fails. It would be better to add tutorial about how and where to download the data.

$ python ./single.py
Started Task: 1
Traceback (most recent call last):
  File "./single.py", line 32, in <module>
    train, test = load_task(FLAGS.data_dir, FLAGS.task_id)
  File "/home/tobe/code/memn2n/data_utils.py", line 14, in load_task
    files = os.listdir(data_dir)
OSError: [Errno 2] No such file or directory: 'data/tasks_1-20_v1-2/en/'

opened by tobegit3hub 1

fix 0 logits in input module
Currently, because the nil embedding is 0 (which is fine) and which we pad to a specified memory size, we tend to have a bunch of memories which are empty [0 0 ... 0]. The problem with this is we feed this into a softmax as is and exp(0) = 1. On the output the empty memories have a uniform probability. This is problematic because it alters the probabilities of non-empty memories.

So the solution is to add a largish negative number to empty memories before sotfmax is applied. Then the exp() of the value will be 0 or close enough.

This issue is particularly evident in task 4 where each story consists of 2 sentences. If we make the memory size large, say 50 (only 2 is needed) 2 things tend to occur:

We converge at a much slower rate

We get a worse error rate

An alternative solution would be make all batch-size 1 (at least at a low level, higher level API can make this nicer). This way the memory can be of any size since nothing in the underlying algorithm relies on the memory being a fixed size (at least I think this is the case, have to double check!).
opened by domluna 1
Difference between code and paper

Hi, Thank you for your codes! It is very helpful.

I noticed a difference between your code and original paper. The paper uses embedding to get c for each story, and directly add o and u to get the input of the predict layer or the u for next layer in the case of multi-hop. In your code, c is the given the same value as m rather than otherwise recalculated. And o is dot producted with a matrix you called H before adding up with u. I am wondering why you do it this way? I haven't tested the difference. Will it influence the performance?

opened by ZijiaLewisLu 0

tokenize function code in data_utils.py is incorrect

with the test intention that

>>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']

we should write like this:

def tokenize(sent):
    return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]

opened by zpengc 0

Support for Ragged/Jagged arrays

On this line, it is mentioned there is not support for jagged arrays, but the new Tensorflow v2.1.0 has introduced RaggedTensor. It would be nice if support for this feature can be provided in the current codebase.

opened by jaibhageria 0
Puzzled about the attention part

m_C = tf.reduce_sum(m_emb_C * self._encoding, 2) c_temp = tf.transpose(m_C, [0, 2, 1])

Here in this part, the first line with reduce_sum should turn the matrix into 2-dimension, so I think it won't work for the transposition in the second line. I am not sure if I am getting something wrong

opened by JustinLin610 1
Found joint.py errors

n_train/20, n_val/20, and n_test/20 cause errors in python3.

I modified n_train/20 -> n_train//20 n_val/20 -> n_val//20 n_test/20 -> n_test//20 and it works

opened by donghyeonk 1

End-To-End Memory Network using Tensorflow

Related tags

Overview

MemN2N

Get Started

Examples

Requirements

Single Task Results

Joint Training Results

Notes

Comments

Owner

Dominique Luna

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

End-to-End Object Detection with Fully Convolutional Network

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

Space Time Recurrent Memory Network - Pytorch

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

PURE: End-to-End Relation Extraction

An end-to-end PyTorch framework for image and video classification

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving