sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

Overview

sequitur

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three different autoencoder architectures in PyTorch, and a predefined training loop. sequitur is ideal for working with sequential data ranging from single and multivariate time series to videos, and is geared for those who want to get started quickly with autoencoders.

import torch
from sequitur.models import LINEAR_AE
from sequitur import quick_train

train_seqs = [torch.randn(4) for _ in range(100)] # 100 sequences of length 4
encoder, decoder, _, _ = quick_train(LINEAR_AE, train_seqs, encoding_dim=2, denoise=True)

encoder(torch.randn(4)) # => torch.tensor([0.19, 0.84])

Each autoencoder learns to represent input sequences as lower-dimensional, fixed-size vectors. This can be useful for finding patterns among sequences, clustering sequences, or converting sequences into inputs for other algorithms.

Installation

Requires Python 3.X and PyTorch 1.2.X

You can install sequitur with pip:

$ pip install sequitur

Getting Started

1. Prepare your data

First, you need to prepare a set of example sequences to train an autoencoder on. This training set should be a list of torch.Tensors, where each tensor has shape [num_elements, *num_features]. So, if each example in your training set is a sequence of 10 5x5 matrices, then each example would be a tensor with shape [10, 5, 5].

2. Choose an autoencoder

Next, you need to choose an autoencoder model. If you're working with sequences of numbers (e.g. time series) or 1D vectors (e.g. word vectors), then you should use the LINEAR_AE or LSTM_AE model. For sequences of 2D matrices (e.g. videos) or 3D matrices (e.g. fMRI scans), you'll want to use CONV_LSTM_AE. Each model is a PyTorch module, and can be imported like so:

from sequitur.models import CONV_LSTM_AE

More details about each model are in the "Models" section below.

3. Train the autoencoder

From here, you can either initialize the model yourself and write your own training loop, or import the quick_train function and plug in the model, training set, and desired encoding size, like so:

import torch
from sequitur.models import CONV_LSTM_AE
from sequitur import quick_train

train_set = [torch.randn(10, 5, 5) for _ in range(100)]
encoder, decoder, _, _ = quick_train(CONV_LSTM_AE, train_set, encoding_dim=4)

After training, quick_train returns the encoder and decoder models, which are PyTorch modules that can encode and decode new sequences. These can be used like so:

x = torch.randn(10, 5, 5)
z = encoder(x) # Tensor with shape [4]
x_prime = decoder(z) # Tensor with shape [10, 5, 5]

API

Training your Model

quick_train(model, train_set, encoding_dim, verbose=False, lr=1e-3, epochs=50, denoise=False, **kwargs)

Lets you train an autoencoder with just one line of code. Useful if you don't want to create your own training loop. Training involves learning a vector encoding of each input sequence, reconstructing the original sequence from the encoding, and calculating the loss (mean-squared error) between the reconstructed input and the original input. The autoencoder weights are updated using the Adam optimizer.

Parameters:

  • model (torch.nn.Module): Autoencoder model to train (imported from sequitur.models)
  • train_set (list): List of sequences (each a torch.Tensor) to train the model on; has shape [num_examples, seq_len, *num_features]
  • encoding_dim (int): Desired size of the vector encoding
  • verbose (bool, optional (default=False)): Whether or not to print the loss at each epoch
  • lr (float, optional (default=1e-3)): Learning rate
  • epochs (int, optional (default=50)): Number of epochs to train for
  • **kwargs: Parameters to pass into model when it's instantiated

Returns:

  • encoder (torch.nn.Module): Trained encoder model; takes a sequence (as a tensor) as input and returns an encoding of the sequence as a tensor of shape [encoding_dim]
  • decoder (torch.nn.Module): Trained decoder model; takes an encoding (as a tensor) and returns a decoded sequence
  • encodings (list): List of tensors corresponding to the final vector encodings of each sequence in the training set
  • losses (list): List of average MSE values at each epoch

Models

Every autoencoder inherits from torch.nn.Module and has an encoder attribute and a decoder attribute, both of which also inherit from torch.nn.Module.

Sequences of Numbers

LINEAR_AE(input_dim, encoding_dim, h_dims=[], h_activ=torch.nn.Sigmoid(), out_activ=torch.nn.Tanh())

Consists of fully-connected layers stacked on top of each other. Can only be used if you're dealing with sequences of numbers, not vectors or matrices.

Parameters:

  • input_dim (int): Size of each input sequence
  • encoding_dim (int): Size of the vector encoding
  • h_dims (list, optional (default=[])): List of hidden layer sizes for the encoder
  • h_activ (torch.nn.Module or None, optional (default=torch.nn.Sigmoid())): Activation function to use for hidden layers; if None, no activation function is used
  • out_activ (torch.nn.Module or None, optional (default=torch.nn.Tanh())): Activation function to use for the output layer in the encoder; if None, no activation function is used

Example:

To create the autoencoder shown in the diagram above, use the following arguments:

from sequitur.models import LINEAR_AE

model = LINEAR_AE(
  input_dim=10,
  encoding_dim=4,
  h_dims=[8, 6],
  h_activ=None,
  out_activ=None
)

x = torch.randn(10) # Sequence of 10 numbers
z = model.encoder(x) # z.shape = [4]
x_prime = model.decoder(z) # x_prime.shape = [10]

Sequences of 1D Vectors

LSTM_AE(input_dim, encoding_dim, h_dims=[], h_activ=torch.nn.Sigmoid(), out_activ=torch.nn.Tanh())

Autoencoder for sequences of vectors which consists of stacked LSTMs. Can be trained on sequences of varying length.

Parameters:

  • input_dim (int): Size of each sequence element (vector)
  • encoding_dim (int): Size of the vector encoding
  • h_dims (list, optional (default=[])): List of hidden layer sizes for the encoder
  • h_activ (torch.nn.Module or None, optional (default=torch.nn.Sigmoid())): Activation function to use for hidden layers; if None, no activation function is used
  • out_activ (torch.nn.Module or None, optional (default=torch.nn.Tanh())): Activation function to use for the output layer in the encoder; if None, no activation function is used

Example:

To create the autoencoder shown in the diagram above, use the following arguments:

from sequitur.models import LSTM_AE

model = LSTM_AE(
  input_dim=3,
  encoding_dim=7,
  h_dims=[64],
  h_activ=None,
  out_activ=None
)

x = torch.randn(10, 3) # Sequence of 10 3D vectors
z = model.encoder(x) # z.shape = [7]
x_prime = model.decoder(z, seq_len=10) # x_prime.shape = [10, 3]

Sequences of 2D/3D Matrices

CONV_LSTM_AE(input_dims, encoding_dim, kernel, stride=1, h_conv_channels=[1], h_lstm_channels=[])

Autoencoder for sequences of 2D or 3D matrices/images, loosely based on the CNN-LSTM architecture described in Beyond Short Snippets: Deep Networks for Video Classification. Uses a CNN to create vector encodings of each image in an input sequence, and then an LSTM to create encodings of the sequence of vectors.

Parameters:

  • input_dims (tuple): Shape of each 2D or 3D image in the input sequences
  • encoding_dim (int): Size of the vector encoding
  • kernel (int or tuple): Size of the convolving kernel; use tuple to specify a different size for each dimension
  • stride (int or tuple, optional (default=1)): Stride of the convolution; use tuple to specify a different stride for each dimension
  • h_conv_channels (list, optional (default=[1])): List of hidden channel sizes for the convolutional layers
  • h_lstm_channels (list, optional (default=[])): List of hidden channel sizes for the LSTM layers

Example:

from sequitur.models import CONV_LSTM_AE

model = CONV_LSTM_AE(
  input_dims=(50, 100),
  encoding_dim=16,
  kernel=(5, 8),
  stride=(3, 5),
  h_conv_channels=[4, 8],
  h_lstm_channels=[32, 64]
)

x = torch.randn(22, 50, 100) # Sequence of 22 50x100 images
z = model.encoder(x) # z.shape = [16]
x_prime = model.decoder(z, seq_len=22) # x_prime.shape = [22, 50, 100]
Comments
  • Asking for multi signal support

    Asking for multi signal support

    Thanks for sharing this nice library!

    I wonder if it is possible to use your work for encoding multi signals.

    I have 4 sensors where each gives 400 samples per instance ( time series, an equal number of samples.) So each of my objects has 4*400 samples ( and this have one label) . As an example, you can consider a dataset like the one used in this work: https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition

    So in this, there is

    INPUT_SIGNAL_TYPES = [
        "body_acc_x_",
        "body_acc_y_",
        "body_acc_z_",
        "body_gyro_x_",
        "body_gyro_y_",
        "body_gyro_z_",
        "total_acc_x_",
        "total_acc_y_",
        "total_acc_z_"
    ]
    

    I would like to feed an autoencoder similar to the one done in the above-mentioned repo. So, the input tensor looks like this (7500, 128,9) => ( nb_of_objects, length of each sequence, nb_of_input_signals) in like this :

    ( this is object 0 of 7500 objects of train set)

    [[ body_acc_x_[0], body_acc_y_[0], ... ,total_acc_y_[0],total_acc_z_[0]],
     [body_acc_x_[1], body_acc_y_[1], ... ,total_acc_y_[1],total_acc_z_[1] ] ,
     ...
     [body_acc_x_[127], body_acc_y_[127], ... ,total_acc_y_[127],total_acc_z_[127] ]]
    

    Could you please tell me if it is possible to encode such signals using your work? If yes, please give me a little hint on how to do so.

    In addition, I don't see any latent space visualization, Do you suggest any specific library to use for such a task?

    opened by sadransh 5
  • Fix issues / Enhance code in LSTM AE

    Fix issues / Enhance code in LSTM AE

    Fix deprecation issue in MSELoss (reduction is now the proper keyword, 'sum' is the mode it used to operate in so I preserved it here) Fix issue where squeezing x in lstm decoder would result in a 1D Tensor, thus returning an error on torch.mm Add gradient clipping Add device argument to quick_train (also goes for train_model and get_encodings)

    opened by Quoding 2
  • Demo not working

    Demo not working

    HI, so just to let you know, trying the dome does not seams to work out


    AttributeError Traceback (most recent call last) in 8 torch.tensor([9, 10, 11, 12]) 9 ] ---> 10 encoder, decoder, _, _ = quick_train(LINEAR_AE, train_seqs, encoding_dim=2, denoise=True) 11 12 encoder(torch.tensor([13, 14, 15, 16])) # => torch.tensor([0.19, 0.84])

    ~/.local/lib/python3.6/site-packages/sequitur/quick_train.py in quick_train(model, train_set, encoding_dim, verbose, lr, epochs, denoise, **kwargs) 80 epochs=50, denoise=False, **kwargs): 81 model = instantiate_model(model, train_set, encoding_dim, **kwargs) ---> 82 losses = train_model(model, train_set, verbose, lr, epochs, denoise) 83 encodings = get_encodings(model, train_set) 84

    ~/.local/lib/python3.6/site-packages/sequitur/quick_train.py in train_model(model, train_set, verbose, lr, epochs, denoise) 30 # device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 31 # model.to(device) ---> 32 optimizer = torch.optim.Adam(model.parameters(), lr=lr) 33 criterion = MSELoss(size_average=False) 34

    AttributeError: 'NoneType' object has no attribute 'parameters'

    opened by pierreyvesm 2
  • Fix issues / enhance quick_train

    Fix issues / enhance quick_train

    Fix deprecation issue in MSELoss (reduction is the proper keyword) Fix issue where squeezing x in lstm decoder would result in a 1D Tensor, thus returning an error on torch.mm Add device optional keyword to quick_train

    Also, format the code via Black.

    opened by Quoding 0
  • Linking code to academic paper

    Linking code to academic paper

    Hi @shobrook

    In the README of this repo, it might help to link to an official academic paper. This one seems most fitting given the approach used here: https://arxiv.org/pdf/1607.00148.pdf

    Oliver

    opened by oliverangelil 0
  • Adding Batch support for LSTM_AE

    Adding Batch support for LSTM_AE

    Hello there, I have added batch support for the LSTM_AE. I don't know if this is 100% compatible with the library, but I'm only using the LSTM_AE in one of my projects, so I decided to contribute 😃 . Thanks

    opened by Mustapha-AJEGHRIR 0
  • about LSTM ae result less than 1

    about LSTM ae result less than 1

    hi, i was use the trail dataset like [[x1,y1], [x2,y2]] and change the shape to [[x1,x2,x3], [y1,y2,y3]] length about 150 and use LSTM AE to train it . i want to use this model to Implement representation learning.but the loss could not reduce. i did not know how to solve it. could you give me some suggestion? I will be looking forward to your rep

    opened by guowhite 1
  • use of deprecated size_average=False in train_model

    use of deprecated size_average=False in train_model

    Training models gives the warning:

    /usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:44: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
      warnings.warn(warning.format(ret))
    

    replacing line 28 of quick_train.py with

       criterion = MSELoss(reduction='sum')
    

    appears to be the recommended way to address this.

    opened by gsimmskutta 0
Owner
Jonathan Shobrook
Jonathan Shobrook
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

Daniil Gavrilov 347 Nov 14, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

null 17 Dec 19, 2022
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

Yasunori Shimura 7 Jul 27, 2022
Create Data & AI apps in 20 lines of code with Shimoku

Install with: pip install shimoku-api-python Start with: from os import getenv import shimoku_api_python.client as Shimoku

Shimoku 5 Nov 7, 2022
Leveraging Two Types of Global Graph for Sequential Fashion Recommendation, ICMR 2021

This is the repo for the paper: Leveraging Two Types of Global Graph for Sequential Fashion Recommendation Requirements OS: Ubuntu 16.04 or higher ver

Yujuan Ding 10 Oct 10, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

Photogrammetry & Robotics Bonn 394 Dec 29, 2022
Just-Now - This Is Just Now Login Friendlist Cloner Tools

JUST NOW LOGIN FRIENDLIST CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 21 Mar 9, 2022
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

null 730 Jan 9, 2023
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Tom-R.T.Kvalvaag 2 Dec 17, 2021
Quickly and easily create / train a custom DeepDream model

Dream-Creator This project aims to simplify the process of creating a custom DeepDream model by using pretrained GoogleNet models and custom image dat

null 55 Dec 27, 2022
Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Hector Kohler 0 Mar 30, 2022
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

null 37 Dec 3, 2022
SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network. General, recipe-driven approaches built around these algorithms enable the simplification of creating faster and smaller models for the ML performance community at large.

Neural Magic 1.5k Dec 30, 2022
Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed Installation cd venv/bin ./pip install -r ../../requirements.txt ./pip install deepspe

Nikita 180 Jan 5, 2023
a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

Nidhal Baccouri 3k Jan 5, 2023
Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Sequential-GAM Pipeline code for Sequential-GAM(Genome Architecture Mapping). mapping whole_preprocess.sh include the whole processing of mapping. usa

null 3 Nov 3, 2022