Sequence modeling benchmarks and temporal convolutional networks

Related tags

Deep Learning TCN
Overview

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN)

This repository contains the experiments done in the work An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling by Shaojie Bai, J. Zico Kolter and Vladlen Koltun.

We specifically target a comprehensive set of tasks that have been repeatedly used to compare the effectiveness of different recurrent networks, and evaluate a simple, generic but powerful (purely) convolutional network on the recurrent nets' home turf.

Experiments are done in PyTorch. If you find this repository helpful, please cite our work:

@article{BaiTCN2018,
	author    = {Shaojie Bai and J. Zico Kolter and Vladlen Koltun},
	title     = {An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling},
	journal   = {arXiv:1803.01271},
	year      = {2018},
}

Domains and Datasets

Update: The code should be directly runnable with PyTorch v1.0.0 or above (PyTorch v>1.3.0 strongly recommended). The older versions of PyTorch are no longer supported.

This repository contains the benchmarks to the following tasks, with details explained in each sub-directory:

  • The Adding Problem with various T (we evaluated on T=200, 400, 600)
  • Copying Memory Task with various T (we evaluated on T=500, 1000, 2000)
  • Sequential MNIST digit classification
  • Permuted Sequential MNIST (based on Seq. MNIST, but more challenging)
  • JSB Chorales polyphonic music
  • Nottingham polyphonic music
  • PennTreebank [SMALL] word-level language modeling (LM)
  • Wikitext-103 [LARGE] word-level LM
  • LAMBADA [LARGE] word-level LM and textual understanding
  • PennTreebank [MEDIUM] char-level LM
  • text8 [LARGE] char-level LM

While some of the large datasets are not included in this repo, we use the observations package to download them, which can be easily installed using pip.

Usage

Each task is contained in its own directory, with the following structure:

[TASK_NAME] /
    data/
    [TASK_NAME]_test.py
    models.py
    utils.py

To run TCN model on the task, one only need to run [TASK_NAME]_test.py (e.g. add_test.py). To tune the hyperparameters, one can specify via argument options, which can been seen via the -h flag.

Comments
  • About different sequence input

    About different sequence input

    I have a totally different sequence, the smallest length is about 100 words, the max is about 5000. I attempt to padding zero to the same length of 5000, but the classified result is terrible. but if I just input different size, that's means keep the original and make the batch_size just 1, that's works well. I don't know why this happens.

    opened by LemoJa 10
  • Can the TCN module be made faster?

    Can the TCN module be made faster?

    I'm using your TCN module for a language modeling task. My code follows the structure of your char_cnn code. It works but the performance is very bad compared to an LSTM network. Each epoch with the TCN network takes about 10 times longer. Do you know if the performance can be improved? Here is the forward method from the TCN class:

        def forward(self, x):
            emb = self.drop(self.encoder(x))
            y = self.tcn(emb.transpose(1, 2))
            o = self.decoder(y.transpose(1, 2))
            return o.contiguous()
    

    Perhaps it is the transpose calls that is making the code slow?

    opened by bjourne 8
  • When I use 28 seqence length for LSTM and TCN, LSTM is much faster than TCN.

    When I use 28 seqence length for LSTM and TCN, LSTM is much faster than TCN.

    It seems to me that LSTM is faster when the sequence length is short (say 28). When the sequence length is long (say 784), LSTM will be much slower than TCN.

    It seems to me for TCN, the computation time is independent of the sequence length.

    Am I correct?

    opened by KinWaiCheuk 7
  • RNN/LSTM Baselines?

    RNN/LSTM Baselines?

    This is a great set of experiments! I'm wondering if the code for the RNN/LSTM baselines reported in the paper are available somewhere. At present, I only see code for the TCN model.

    Thanks!

    opened by millerjohnp 7
  • why temporal pad at all?

    why temporal pad at all?

    Nice work! I'm researching time series regression using machine learning so I'm looking at LSTM, TCN and Transformers based models and getting good results with your model.

    One general question? I'm not sure I understand the reason why we pad each layer of a TCN at all. I understand that it ensures each layer produces a sequence of the same length so there's a benefit in that your predictions are aligned with your inputs. But it's very similar to initialising an AR(p) model with a vector of zeros when you predict forward - the initial predictions will all be "wrong" until the effect of the initial state has decayed out. LSTM's also have this issue - most applications seem to set the initial state per batch to zero which results in transients errors at the start of the batch (some authors train a separate model to estimate the initial state which I've had good success with). I would assume this would impact training as well and it seems to make sense to mask out the start of the output sequence when calculating the loss or the model may try and adapt to "fix" the impact of the wrong ic.

    Certainly when I train a regression-based TCN I can observe transient errors at the start of the prediction - i.e. the diagram below underpredicts for the first 96 samples (that's 1 day of 15minute electricity consumption) then overpredicts for the first week before settling down. Interested in your thoughts.

    Also, one general observation - the prediction from TCN seem noisier than LSTM, I thought the long AR window might filter out more noise than it has. Plus it's quite sensitive to learning rate - low learning rate produces a very noisey output sequence.

    image

    opened by david-waterworth 6
  • Recommendation for image to text

    Recommendation for image to text

    My goal is to train a model that can output sequences of text from image inputs. Using the IAM handwriting dataset for example, we would pass the model an image

    image

    and expect it to return "broadcast and television report on his". Historically, the common (i.e. recurrent) way to accomplish this would be an encoder (CNN) + decoder (LSTM) architecture like OpenNMT's implementation. I am interested in replacing the decoder with a TCN, but am unsure how to approach the image data. The CNN encoder will create a batch of N features maps with reduced spatial dimensions (H', W')

    image

    The issue is a TCN expects 3D tensors (N, L, C) whereas each "timestep" of the image is 2D (N, H, W, C). Following the p-MNIST example in the paper, we could flatten the image into a 1D sequence with length H' x W'. Then the TCN would effectively snake through the pseudo-timesteps like below

    image

    However, if we want one prediction per timestep it makes much more sense to define a left-to-right sequence instead of a snaking one since that's the direction the text is depicted in the image. Did you experiment at all with image to text models, and if so, how did you chose to represent the images?

    I also wonder about the loss function for training a TCN decoder. Assuming you divide the image width into more timesteps than your maximum expected sequence length, it seems like connectionist temporal classification (CTC) would be a good choice. Then you do not have to worry about alignment between the target sequence and model's prediction. For instance, "bbb--ee-cau--sssss----e" would be collapsed to "because" by combining neighboring duplicates and removing blanks. Do you agree or is there a different loss function you would suggest?

    opened by addisonklinke 5
  • Clarification on figure 3(a)

    Clarification on figure 3(a)

    Hello and thanks for the paper and the helpful codebase!

    I just wanted to clarify how the convergence plots in the paper were generated, particularly fig 3(a). The Y axis is labelled test accuracy, however the X values seems to be more frequent than every epoch. Could you confirm what data is being evaluated on here and if smoothing is taking place? Thanks

    opened by alanjeffares 4
  • How to reproduce results from the paper?

    How to reproduce results from the paper?

    Is it just the testing result in the last epoch using default parameters? I have tried to run add_test.py and below is the result i get for the 10 epochs.

    Test set: Average loss: 0.168699 Test set: Average loss: 0.001142 Test set: Average loss: 0.000922 Test set: Average loss: 0.000345 Test set: Average loss: 0.000143 Test set: Average loss: 0.000188 Test set: Average loss: 0.000121 Test set: Average loss: 0.000028 Test set: Average loss: 0.000244 Test set: Average loss: 0.000042

    Which one should I use for benchmarking? In the paper, the result of TCN was 5.8e-5 but it seems like we can use 2.8e-5 or 4.2e-5 here.

    opened by johnsyin 4
  • Cutting off effective history when evaluating char_cnn model.

    Cutting off effective history when evaluating char_cnn model.

    I don't get why on test time (or when evaluating the model on a validation set), we don't compute the loss on all the sequence and not only on a part of the sequence that ensures sufficient history. The model is not evaluated on the whole dataset but only on a sub-part, are the results reliable ? or even comparable to other models (LSTM, ect ) that doesn't use this method ?

    opened by mok33 4
  • Is TCN suitable for time series regression?

    Is TCN suitable for time series regression?

    Hello,

    Thank you for your great paper and sharing!

    I'm wondering how to use TCN to solve time series regression problems. In my time series scenario, data for each moment contains multiple variables and each variable is a real number. For example, data for time step 0 is something like "vector_0 = <0.1, 0.2, 0.3, ...>", and I want to use historical k vectors to predict the next vector data.

    I have developed a LSTM model for this question. The input shape of LSTM model is "batch_size, time_steps(k), input_size(length of each vector)", and the prediction result is the last value of LSTM. Then I could calculate the MSE loss and do backward. How can I use TCN to solve this problem?

    Best Regards

    opened by liuzf13 4
  • Causal Transposed Convolution

    Causal Transposed Convolution

    Hi,

    Thanks for this great paper ... I am trying to use this architecture in auto-encoder setting such that the encoder part is a stack of strided-dilated-causal conv layers and now thinking about the decoder part.

    In terms of up-sampling using transposed convolutions, does it follow the same intuition in order to have causal up-sampling (i.e. to exclude the reconstructions of future part) ? Or shall we generate sample-by-sample without transposed conv layers ?

    With many thanks in advance Best Regards

    opened by ahmed-fau 4
  • Correlate .mat files with songs in Nottingham dataset

    Correlate .mat files with songs in Nottingham dataset

    I have all the .abc files. I have made sure the shape of all the X variables combined is the same as the number of songs. But after I load the .mat file for Nottingham how do I say that the first element in the numpy array corresponds to the song entitled "...."? I want to match data with songs listed in the .abc files.

    opened by demongolem-biz2 0
  • why?

    why?

    I change the output "return torch.mean(self.network(x),dim=2)" for multi-feature time series and the training time reduces significantly..so why--lol

    opened by RaganrokV 0
  • What is the accuracy supposed to be for the MNIST problem?

    What is the accuracy supposed to be for the MNIST problem?

    After each epoch (at least the first 6 so far), I get 982/10000 (10%) accuracy which can't be right. What should the accuracy be as was originally designed?

    opened by demongolem-biz 0
  • Code Question about: input the final conv-layer output to the linear layer

    Code Question about: input the final conv-layer output to the linear layer

    Great code guys! Can I ask a question at this code? https://github.com/locuslab/TCN/blob/master/TCN/adding_problem/model.py#L17

    Usually when I implement CNN-series model, the calculation of the last layer CNN dimension was always a problem.

    In your code, Line17, it looks the final linear layer just catches a part of the conv-layer output. Is this understanding correct? Does it ignore many other parameters that in the conv-layer output?

    What shocks me is, when I test such implementation on other traditional CNN models, they also works. (I mean just use like self.linear(y1[:, :, -1])) Does this mean the task is simple for the designed CNN because we just dropped a lot of neurons in it?

    Will be highly appreciated if someone could advice.

    opened by ShengzheXu 0
  • How should I choose correct layers number?

    How should I choose correct layers number?

    Suppose my input sequences are of varying lengths (between 15 ~ 400) and 3 features.

    What should I pass to TemporalConvNet(3, [3] * layers) for layers?

    opened by GF-Huang 0
Owner
CMU Locus Lab
Zico Kolter's Research Group
CMU Locus Lab
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper “Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

CunchaoZ 89 Jan 3, 2023
Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos Introduction Point cloud videos exhibit irregularities and lack of or

Hehe Fan 101 Dec 29, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in OpenAI gym

Nikhil Barhate 104 Jan 6, 2023
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Zhiwu Qing 63 Sep 27, 2022