Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

Phil Wang

Last update: Nov 6, 2022

Related tags

Overview

Memformer - Pytorch

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch. It includes memory slots, which are updated with attention, learned efficiently through Memory-Replay BackPropagation (MRBP) through time.

Install

$ pip install memformer

Usage

Full encoder / decoder, as in the paper

import torch
from memformer import Memformer

model = Memformer(
    dim = 512,
    enc_num_tokens = 256,
    enc_depth = 2,
    enc_heads = 8,
    enc_max_seq_len = 1024,
    dec_num_tokens = 256,
    dec_depth = 2,
    dec_heads = 8,
    dec_max_seq_len = 1024,
    num_memory_slots = 128
)

src_seg_1 = torch.randint(0, 256, (1, 1024))
src_seg_2 = torch.randint(0, 256, (1, 1024))
src_seg_3 = torch.randint(0, 256, (1, 1024))

tgt = torch.randint(0, 256, (1, 1024))

enc_out1, mems1,    _ = model(src_seg_1) # (1, 1024, 512), (1, 128, 512), _
enc_out2, mems2,    _ = model(src_seg_2, mems = mems1)
enc_out3, mems3, loss = model(src_seg_3, tgt, mems = mems2)

loss.backward()

Encoder only

import torch
from memformer import Memformer

model = Memformer(
    dim = 512,
    enc_num_tokens = 256,
    enc_heads = 8,
    enc_depth = 2,
    enc_max_seq_len = 1024,
    num_memory_slots = 128,
    num_mem_updates = 2,
    encoder_only = True       # only use encoder, in which output is encoded output
)

src1 = torch.randint(0, 256, (1, 1024))
src2 = torch.randint(0, 256, (1, 1024))

enc1, mems1 = model(src1) # (1, 1024, 512), (1, 128, 512)
enc2, mems2 = model(src2, mems = mems1)

Memory Replay Back-Propagation

import torch
from memformer import Memformer, memory_replay_backprop

model = Memformer(
    dim = 512,
    num_memory_slots = 128,
    enc_num_tokens = 256,
    enc_depth = 2,
    enc_max_seq_len = 1024,
    dec_num_tokens = 256,
    dec_depth = 2,
    dec_max_seq_len = 1024
).cuda()

seq = torch.randint(0, 256, (1, 8192)).cuda()
seq_mask = torch.ones_like(seq).bool().cuda()

tgt = torch.randint(0, 256, (1, 512)).cuda()
tgt_mask = torch.ones_like(tgt).bool().cuda()

# will automatically split the source sequence to 8 segments
memory_replay_backprop(
    model,
    src = seq,
    tgt = tgt,
    src_mask = seq_mask,
    tgt_mask = tgt_mask
)

Citations

@inproceedings{
    anonymous2021memformer,
    title={Memformer: The Memory-Augmented Transformer},
    author={Anonymous},
    booktitle={Submitted to International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=_adSMszz_g9},
    note={under review}
}

You might also like...

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

139 Dec 29, 2022

TANL: Structured Prediction as Translation between Augmented Natural Languages

TANL: Structured Prediction as Translation between Augmented Natural Languages Code for the paper "Structured Prediction as Translation between Augmen

98 Dec 15, 2022

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

10 Oct 6, 2022

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC

49 Nov 28, 2022

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

58 Dec 17, 2022

DrQ-v2: Improved Data-Augmented Reinforcement Learning

DrQ-v2: Improved Data-Augmented RL Agent Method DrQ-v2 is a model-free off-policy algorithm for image-based continuous control. DrQ-v2 builds on DrQ,

234 Jan 1, 2023

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

60 Dec 30, 2022

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering Authors: Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou and

72 Dec 5, 2022

Comments

WIP - MemformerEncoder
I´m always trying all your awesome work on transformers. My problem is NER on very large texts, with few examples.

Memformer is the first one so far to converge faster and wield better accuracy than RNN encoders as LSTM, SRU and IndRNN It is ridiculously better than everything else I tested, congratulations @lucidrains 🥳

I need to use the transformer as a Encoder in my pipeline, to feed a CRF layer. So I modified the code to accept an already embedded input, and to only do the Encode step.

TODO:

[ ] Support Mask

[ ] Re-utilize code with Memformer class

Is this within the scope of the project?
opened by bratao 10
ETA on complete examples

@lucidrains As I asked about the feedback-transformer, I was also wondering about this memformer implementation as I would love to try it. Any eta on any complete examples here? They will be much appreciated. Thanks.

And similarly, I would love to see a simple example for custom line-by-line TXT datasets as well.

Thank you again :)

opened by asigalov61 0

Releases(0.3.1)

0.3.1(Nov 13, 2020)

Source code(tar.gz)
Source code(zip)
0.3.0(Nov 6, 2020)

Source code(tar.gz)
Source code(zip)
0.2.1(Nov 3, 2020)

Source code(tar.gz)
Source code(zip)
0.2.0(Nov 3, 2020)

Source code(tar.gz)
Source code(zip)
0.1.1(Nov 3, 2020)

Source code(tar.gz)
Source code(zip)
0.1.0(Nov 2, 2020)

Source code(tar.gz)
Source code(zip)
0.0.9(Oct 31, 2020)

Source code(tar.gz)
Source code(zip)
0.0.8(Oct 30, 2020)

Source code(tar.gz)
Source code(zip)
0.0.7(Oct 30, 2020)

Source code(tar.gz)
Source code(zip)
0.0.6(Oct 30, 2020)

Source code(tar.gz)
Source code(zip)
0.0.5(Oct 29, 2020)

Source code(tar.gz)
Source code(zip)
0.0.4(Oct 29, 2020)

Source code(tar.gz)
Source code(zip)
0.0.3(Oct 29, 2020)

Source code(tar.gz)
Source code(zip)
0.0.2(Oct 28, 2020)

Source code(tar.gz)
Source code(zip)
0.0.1(Oct 28, 2020)

Source code(tar.gz)
Source code(zip)

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

Related tags

Overview

Memformer - Pytorch

Install

Usage

Citations

You might also like...

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Styled Augmented Translation

TANL: Structured Prediction as Translation between Augmented Natural Languages

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

DrQ-v2: Improved Data-Augmented Reinforcement Learning

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Comments

WIP - MemformerEncoder

ETA on complete examples

Releases(0.3.1)

0.3.1(Nov 13, 2020)

0.3.0(Nov 6, 2020)

0.2.1(Nov 3, 2020)

0.2.0(Nov 3, 2020)

0.1.1(Nov 3, 2020)

0.1.0(Nov 2, 2020)

0.0.9(Oct 31, 2020)

0.0.8(Oct 30, 2020)

0.0.7(Oct 30, 2020)

0.0.6(Oct 30, 2020)

0.0.5(Oct 29, 2020)

0.0.4(Oct 29, 2020)

0.0.3(Oct 29, 2020)

0.0.2(Oct 28, 2020)

0.0.1(Oct 28, 2020)

Owner

Phil Wang

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch