Sequence Modeling with Structured State Spaces

HazyResearch

Last update: Jan 1, 2023

Related tags

Deep Learning state-spaces

Overview

Structured State Spaces for Sequence Modeling

This repository provides implementations and experiments for the following papers.

S4

Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, Christopher Ré
Paper: https://arxiv.org/abs/2111.00396

LSSL

Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer
Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré
Paper: https://arxiv.org/abs/2110.13985

HiPPO

HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, Christopher Ré
Paper: https://arxiv.org/abs/2008.07669

Setup

Requirements

This repository requires Python 3.8+ and Pytorch 1.9+. Other packages are listed in requirements.txt.

Data

Datasets and Dataloaders

All logic for creating and loading datasets is in src/dataloaders. This folders includes many old and experimental datasets. The datasets that we consider core are located in src/dataloaders/datasets.py.

The raw data should be organized as follows. The data path can be configured by the environment variable DATA_PATH, or defaults to ./data by default, where . is the top level directory of this repository (e.g. 'state-spaces').

Data

External datasets include Long Range Arena (LRA), which can be downloaded from their GitHub page.

These external datasets should be organized as follows:

DATA_PATH/
  pathfinder/
    pathfinder32/
    pathfinder64/
    pathfinder128/
    pathfinder256/
  aan/
  listops/

Fine-grained control over the data directory is allowed, e.g. if the LRA ListOps files are located in /home/lra/listops-1000/, you can pass in +dataset.data_dir=/home/lra/listops-1000 on the command line

Cauchy Kernel

A core operation of S4 is the "Cauchy kernel" described in the paper. The implementation of this requires one of two methods:

Custom CUDA Kernel

This version is faster but requires manual compilation on each machine. Run python setup.py install from the directory extensions/cauchy/.

Pykeops

This version is provided by the pykeops library. Installation usually works out of the box with pip install pykeops cmake which are provided in the requirements file.

Note that running in a Colab requires installing a different pip package; instructions can be found in the pykeops documentation.

S4 Experiments

This section describes how to use the latest S4 model and reproduce experiments immediately. More detailed descriptions of the infrastructure are in the subsequent sections.

Structured State Space (S4)

The S4 module is found at src/models/sequence/ss/s4.py.

For users who would like to import a single file that has the self-contained S4 layer, a standalone version can be found at src/models/sequence/ss/standalone/s4.py.

Testing

For testing, we frequently use synthetic datasets or the Permuted MNIST dataset. This can be run with python -m train wandb=null pipeline=mnist model=s4, which should get to around 90% after 1 epoch which takes 2-4 minutes depending on GPU.

Long Range Arena (LRA)

python -m train wandb=null experiment=s4-lra-listops
python -m train wandb=null experiment=s4-lra-imdb
python -m train wandb=null experiment=s4-lra-cifar
python -m train wandb=null experiment=s4-lra-aan
python -m train wandb=null experiment=s4-lra-pathfinder
python -m train wandb=null experiment=s4-lra-pathx

Note that these experiments may take different amounts of time to train. IMDB should take just 1-2 hours, while Path-X will take several epochs to take off and take over a day to train to completion.

CIFAR-10

python -m train wandb=null experiment=s4-cifar

The above command line reproduces our best sequential CIFAR model. Decreasing the model size should yield close results, e.g. halving the hidden dimension with model.d_model=512.

Speech Commands

The Speech Commands dataset we compare against is a modified smaller 10-way classification task.

python -m train wandb=null experiment=s4-sc

To use the original version with the full 35 classes, pass in dataset.all_classes=true

Training

The core training infrastructure of this repository is based on Pytorch-Lightning with a configuration scheme based on Hydra. The structure of this integration largely follows the Lightning+Hydra integration template described in https://github.com/ashleve/lightning-hydra-template.

The main experiment entrypoint is train.py and configs are found in configs/. In brief, the main config is found at configs/config.yaml, which is combined with other sets of configs that can be passed on the command line, to define an overall YAML config. Most config groups define one single Python object (e.g. a PyTorch nn.Module). The end-to-end training pipeline can broken down into the following rough groups, where group XX is found under configs/XX/:

model: the sequence-to-sequence model backbone (e.g. a src.models.sequence.SequenceModel)
dataset: the raw dataset (data/target pairs) (e.g. a pytorch Dataset)
loader: how the data is loaded (e.g. a pytorch DataLoader)
encoder: defines a Module that interfaces between data and model backbone
decoder: defines a Module that interfaces between model backbone and targets
task: specifies loss and metrics

Default combinations of dataset+loader+encoder+decoder+task are further consolidated into groups called pipelines.

A run can be performed by passing in a pipeline config, model config, and any additional arguments modifying the default configurations. A simple example experiment is

python -m train pipeline=mnist dataset.permute=True model=s4 model.n_layers=3 model.d_model=128 model.norm=batch model.prenorm=True wandb=null

This uses the permuted sequential MNIST task and uses an s4 model with a specified number of layers, backbone dimension, and normalization type.

Hydra

It is recommended to read the Hydra documentation to fully understand the configuration framework. For help launching specific experiments, please file an Issue.

Registries

This codebase uses a modification of the hydra instantiate utility that provides shorthand names of different classes, for convenience in configuration and logging. The mapping from shorthand to full path can be found in src/utils/registry.py.

WandB

Logging with WandB is built into this repository. In order to use this, simply set your WANDB_API_KEY environment variable, and change the wandb.project attribute of configs/config.yaml (or pass it on the command line python -m train .... wandb.project=s4).

Set wandb=null to turn off WandB logging.

Models

This repository provides a modular and flexible implementation of sequence models at large.

SequenceModule

SequenceModule src/models/sequence/base.py is the abstract interface that all sequence models adhere to. In this codebase, sequence models are defined as a sequence-to-sequence map of shape (batch size, sequence length, input dimension) to (batch size, sequence length, output dimension).

The SequenceModule comes with other methods such as step which is meant for autoregressive settings, and logic to carry optional hidden states (for stateful models such as RNNs or S4).

SequenceModel

SequenceModel src/models/sequence/model.py is the main backbone with configurable options for residual function, normalization placement and type, etc. SequenceModel accepts a black box config for a layer. Compatible layers are SequenceModules (i.e. composable sequence transformations) found under src/models/sequence/.

S4

This is the main model of this repository. See instructions in Getting Started.

LSSL

The LSSL is an old version of S4. It is currently not recommended for use, but the model can be found at src/models/sequence/ss/lssl.py.

It can be run with model/layer=lssl or model/layer=lssl model.layer.learn=0 for the LSSL-fixed model which does not train A, B, or dt.

HiPPO

HiPPO is the mathematical framework upon which the papers HiPPO, LSSL, and S4 are built on. The logic for HiPPO operators is found under src/models/hippo/.

HiPPO-RNN cells from the original [https://arxiv.org/abs/2008.07669] can be found under the RNN cells

RNNs

This codebase contains a flexible and modular implementation of many RNN cells.

Some examples include model=rnn/hippo-legs and model=rnn/hippo-legt for HiPPO variants from the original paper, or model=rnn/gru for a GRU reimplementation, etc.

An exception is model=lstm to use the PyTorch LSTM.

Example command (reproducing the Permuted MNIST number from the HiPPO paper, which was SotA at the time):

python train.py pipeline=mnist model=rnn/hippo-legs model.cell_args.hidden_size=512 train.epochs=50 train.batch_size=100 train.lr=0.001

Baselines

Other sequence models are easily incorporated into this repository, and several other baselines have been ported.

These include CNNs such as the WaveGAN Discriminator and CKConv and continuous-time/RNN models such as UnICORNN and LipschitzRNN.

python -m train dataset=mnist model={ckconv,unicornn}

Overall Repository Structure

configs/         config files for model, data pipeline, training loop, etc.
data/            default location of raw data
extensions/      CUDA extension for Cauchy kernel
src/             main source code for models, datasets, etc.
train.py         main entrypoint

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@article{gu2021efficiently,
  title={Efficiently Modeling Long Sequences with Structured State Spaces},
  author={Gu, Albert and Goel, Karan and R{\'e}, Christopher},
  journal={arXiv preprint arXiv:2111.00396},
  year={2021}
}

@article{gu2021combining,
  title={Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers},
  author={Gu, Albert and Johnson, Isys and Goel, Karan and Saab, Khaled and Dao, Tri and Rudra, Atri and R{\'e}, Christopher},
  journal={Advances in neural information processing systems},
  volume={34},
  year={2021}
}

@article{gu2020hippo,
  title={HiPPO: Recurrent Memory with Optimal Polynomial Projections},
  author={Gu, Albert and Dao, Tri and Ermon, Stefano and Rudra, Atri and Re, Christopher},
  journal={Advances in neural information processing systems},
  volume={33},
  year={2020}
}

Comments

S4D Memory Requirements

Hey, I wanted to give S4D a quick try in my research as a drop-in replacement of S4 (which, as far as I gathered, should be a good way to start), but I'm running into some hard memory limitations. I'm trying to train the DiffWave version of SaShiMi as a first experiment, but the memory requirements seem to increase significantly when replacing S4 with an equivalent S4D layer (with default settings), causing the model to go OOM in my case actually (so I don't have any precise measurements, but it's a 20% increase in overall memory consumption at least. I use the parameters as discussed in #46. Is this something you'd expect?

opened by stefan-baumann 26
Model can not converge on the LRA Pathfinder

Hi,

Thanks for the great work! When I ran your code on the LRA pathfinder dataset (using your config), I found it can't converge till the end of the 200th epoch as shown in the following log: loss=0.693, val/accuracy=0.499, val/loss=0.693, test/accuracy=0.495, test/loss=0.693, train/accuracy=0.501, train/loss=0.693. The loss is 0.693 throughout training.

Do you have any thoughts on this? Thanks!

opened by violet-zct 19

SaShiMi generation script errors out with own models

Hey, first of all, great work with the repository, I don't think I've worked with a repository for a paper that's so extensive and well-structured so far.

I'm currently trying to train the SaShiMi model on my own dataset (following your guide here: https://github.com/HazyResearch/state-spaces/issues/23), and I run into some issues when trying to generate samples with the trained model. In case this is relevant, I'm trying to do inference on the checkpoint files, and I changed the number of layers (model.n_layers) to 4 to accommodate for the memory limitations of my GPU. Apart from that, I have done no changes to any of the training and model (code) except for switching the dataset to my own. When I try to call the generation.py script now, I run into a range of errors:

The config overrides cause some errors, namely the hurwitz parameter does not exist anymore, and the setup_step methods don't seem to correctly accept (or rather pass them downstream) the mode argument. I "fixed" this by removing the hurwitz argument override and by adding the mode argument to all module.setup_step() methods and just passing it downstream as required.
Additionally, setting model.layer.postact=null causes the state_dict to not load successfully anymore, giving me the following error:

Missing key(s) in state_dict: "model.c_layers.0.layer.output_linear.weight", "model.c_layers.0.layer.output_linear.bias", "model.c_layers.2.layer.output_linear.weight", "model.c_layers.2.layer.output_linear.bias", "model.c_layers.4.layer.output_linear.weight", "model.c_layers.4.layer.output_linear.bias", "model.c_layers.6.layer.output_linear.weight", "model.c_layers.6.layer.output_linear.bias", "model.u_layers.0.1.layer.output_linear.weight", "model.u_layers.0.1.layer.output_linear.bias", "model.u_layers.0.3.layer.output_linear.weight", "model.u_layers.0.3.layer.output_linear.bias", "model.u_layers.0.5.layer.output_linear.weight", "model.u_layers.0.5.layer.output_linear.bias", "model.u_layers.0.7.layer.output_linear.weight", "model.u_layers.0.7.layer.output_linear.bias", "model.u_layers.1.1.layer.output_linear.weight", "model.u_layers.1.1.layer.output_linear.bias", "model.u_layers.1.3.layer.output_linear.weight", "model.u_layers.1.3.layer.output_linear.bias", "model.u_layers.1.5.layer.output_linear.weight", "model.u_layers.1.5.layer.output_linear.bias", "model.u_layers.1.7.layer.output_linear.weight", "model.u_layers.1.7.layer.output_linear.bias". 
Unexpected key(s) in state_dict: "model.c_layers.0.layer.output_linear.0.weight", "model.c_layers.0.layer.output_linear.0.bias", "model.c_layers.2.layer.output_linear.0.weight", "model.c_layers.2.layer.output_linear.0.bias", "model.c_layers.4.layer.output_linear.0.weight", "model.c_layers.4.layer.output_linear.0.bias", "model.c_layers.6.layer.output_linear.0.weight", "model.c_layers.6.layer.output_linear.0.bias", "model.u_layers.0.1.layer.output_linear.0.weight", "model.u_layers.0.1.layer.output_linear.0.bias", "model.u_layers.0.3.layer.output_linear.0.weight", "model.u_layers.0.3.layer.output_linear.0.bias", "model.u_layers.0.5.layer.output_linear.0.weight", "model.u_layers.0.5.layer.output_linear.0.bias", "model.u_layers.0.7.layer.output_linear.0.weight", "model.u_layers.0.7.layer.output_linear.0.bias", "model.u_layers.1.1.layer.output_linear.0.weight", "model.u_layers.1.1.layer.output_linear.0.bias", "model.u_layers.1.3.layer.output_linear.0.weight", "model.u_layers.1.3.layer.output_linear.0.bias", "model.u_layers.1.5.layer.output_linear.0.weight", "model.u_layers.1.5.layer.output_linear.0.bias", "model.u_layers.1.7.layer.output_linear.0.weight", "model.u_layers.1.7.layer.output_linear.0.bias".

Does this mean that I should rename those keys manually (there's a fairly clear correspondence) to make it work after changing the activation?

Finally, even when I pass through the mode parameter in module.setup_step(), I still get this error:

Traceback (most recent call last):
  File "/home/debaumas/state-spaces/sashimi/generation.py", line 192, in main
    module.setup_step(mode='dense')
  File "/home/debaumas/state-spaces/src/models/sequence/ss/kernel.py", line 1038, in setup_step
    self.kernel.setup_step(mode=mode)
  File "/home/debaumas/state-spaces/src/models/sequence/ss/kernel.py", line 515, in setup_step
    dC = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: (Batch element 0): The diagonal element 1 is zero, the solve could not be completed because the input matrix is singular.

Do you have any idea what might be causing this and maybe an idea about how to fix/circumvent this?

It'd be awesome if you could help point me in the right direction with this.

Best, Stefan

opened by stefan-baumann 16

GPU Out of Memory

I was wondering what parameters I could change to be able to run it on GPU with limited RAM. I tried reducing the layers to 4, which did not help. Also, it seems like batch size is set to 1 by default. I am using 4x TITAN RTX 24GB.

opened by davidmrau 13
Inconsistent results of forward (training) and step (inference)

Hi, I did a simple test to verify the difference between forward and step (mode="dense") on a single unidirectional S4 layer. Given a random sequence, there difference, the absolute error is around 1e-2 and the square error is around 1e-4. I suspect these results are wrong. My verification follows test_step() in //src/models/sequence/ss/kernel.py. I'd love to know if you have examples that clearly compares their difference. Thanks:)

opened by cycycycywu 10
Multiple `setup_rnn` calls

Hey!

I'm attempting to integrate the Sashimi Backbone into some audio models -- I'd like to train in convolutional mode and run validation inference in RNN mode, but my reading of the code seems to imply that the setup_step call isn't repeatable or reversible (#67 seems to imply this as well).

In the case that I temporarily want to infer in RNN mode, but then switch back to the convolutional training mode, what's my best option?

opened by kradonneoh 7
Unable to load wt103 checkpoint, size mismatch

Hi Albert,

I tried to load your recently uploaded wikitext-103 checkpoint, but encountered the following error:

RuntimeError: Error(s) in loading state_dict for SequenceLightningModule: size mismatch for encoder.0.emb_layers.3.weight: copying a param with shape torch.Size([67738, 16]) from checkpoint, the shape in current model is torch.Size([67737, 16]). size mismatch for loss.out_layers_biases.3: copying a param with shape torch.Size([67738]) from checkpoint, the shape in current model is torch.Size([67737]). size mismatch for loss_val.out_layers_biases.3: copying a param with shape torch.Size([67738]) from checkpoint, the shape in current model is torch.Size([67737]).

Do you know why is it? I used the wt103 data downloaded from the transformer-xl repo: https://github.com/kimiyoung/transformer-xl/blob/master/getdata.sh.

Thanks!

opened by violet-zct 6
ValueError when running on Pathfinder

Hi, I am getting the following error when trying to train S4 on the pathfinder dataset. Any help would be greatly appreciated.

Traceback (most recent call last): File "/data/al451/state-spaces/train.py", line 553, in main train(config) File "/data/al451/state-spaces/train.py", line 498, in train trainer.fit(model) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit self._call_and_handle_interrupt( File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1172, in _run self._call_setup_hook() # allow user to setup lightning_module in accelerator environment File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1492, in _call_setup_hook self._call_lightning_module_hook("setup", stage=fn) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/data/al451/state-spaces/train.py", line 56, in setup self.dataset.setup() File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1234, in setup dataset = PathFinderDataset(self.data_dir, transform=self.default_transforms()) File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1130, in init path_list = sorted( File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1132, in key=lambda path: int(path.stem), ValueError: invalid literal for int() with base 10: '._142'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

opened by andrewliu2001 5
Experiment reproduction issue with updated modules
Hi, I was trying to reproduce some of your results using the SaShiMi model by running the command

python -m train experiment=sashimi-sc09 wandb=null

but I get the error

TypeError: __init__() got an unexpected keyword argument 'pool'

due to the DownPool class no longer needing pool parameter for initialization.

Can I ask if there are any plans to fix these issues so that they work with the current implementations of the different modules?
opened by jhuang265 5
Unused parameters in training
Hi! I'm running some experiments using your code. For my use-case, I'm using torch.nn.DistributedDataParallel, which automatically detects unused parameters, i.e., parameters that get no gradients.

The unused parameters are:

D (from the S4 module)

output_linear.weight and output_linear.bias (from the S4 module). These are instances of the TransposedLinear layer.

kernel.C (from SSKernelNPLR).

I have manually confirmed these parameters don't get gradients by running the following code after computing the loss:

for name, param in model.named_parameters(): if param.grad is None: print(name)

Usually, the above means the parameters are instantiated but not used. In this case, surprisingly, all the parameters get used in the forward method. However, none of them get used in "vanilla" PyTorch ops. D, output_linear.weight and output_linear.bias get used through opt_einsum.contract, and kernel.C gets used through your Cauchy GPU op.

Can you confirm the issue on your end? These parameters all look important for the model.
opened by tjppires 5
can't run train.py w/o compiling Cauchy kernel for CUDA
Dear all,

I am having trouble compiling the Cauchy kernel, and although I have installed pykeops, running train.py always results in errors like this:

_RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support:

to perform computations on CPU, simply set tagHostDevice to 0

to perform computations on GPU, please recompile the formula with a working version of cuda._

The only thing that fixed the issues for me is commenting out the following try/catch. Without that (sorry for its uglyness...) the code never did default back to the slow kernel... now it does, but that is certainly not the right way for me to go about it ;)

I wonder if the try/catch-phrase needs to check whether the kernel actually runs, not just lets itself be imported?

''' try: import pykeops from src.models.functional.cauchy import cauchy_conj has_pykeops = True except ImportError: has_pykeops = False from src.models.functional.cauchy import cauchy_conj_slow if not has_cauchy_extension: log.error( "Falling back on slow Cauchy kernel. Install at least one of pykeops or the CUDA extension for efficiency." ) ''' has_pykeops = False from src.models.functional.cauchy import cauchy_conj_slow if not has_cauchy_extension: log.error( "Falling back on slow Cauchy kernel. Install at least one of pykeops or the CUDA extension for efficiency." )
opened by DorotheaKolossa 5
Interchangeability of cauchy kernel methods

Hi Albert,

I've been training an image generation model using the S4 module and the cauchy extension (compiled code on local machine). Within a model trained with the cauchy extension, would you expect performance differences if the naive implementation (slow kernel) was used for evaluation? Or is the naive implementation less robust than the extension?

My goal is to visually inspect a few things, but am experiencing problems using the extension in a jupyter notebook (even when placing all tensors/models on a GPU).

Thanks again for your time, Tommy

opened by tlasmanbotch 0
Training on midi

I want to train a S4 model on midi, which I already have as discrete events, for example in the form of .csv. How do I create a dataset with which I can train?
In other words, how can I create a dataset consisting of three columns in a .csv files or how can I train such files with python -m train? I would be very grateful to get an answer as soon as possible.

This is what the csv data looks like: 960 | 70 | 0 0 | 70 | 74 120 | 70 | 0 120 | 70 | 70 120 | 70 | 0

opened by ep-oi 3
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Understanding the moving parts for training
Hi team!

The paper and the accompanying codebase are really great! We are trying to use S4 for a different problem and there are a lot of engineering details that seem to be affecting the training:

Adding/removing GLU activation => Also affects ListOps a lot

Dropout vs Dropout2D

Learning rate of the state space parameters and their schedule in relation to rest of parameters

Whether to share the diagonal and low-rank params depth-wise or not

Whether to share log_step depth-wise or not => Leads to NaN loss at times

Whether to use the NPLR formulation from S4 paper or Sashimi

Whether to use S4 or S4D or DSS

Bidirectional or unidirectional ...

From your experience, could you share any ideas on how to choose from these options? Also, could you list any other important details that might affect training?
opened by nbgundavarapu 1

Getting `KeyError: 'nvrtc'` on CPU-only machine

Versions:

Python 3.9.14
macOS 12.6
state-spaces@292984c
ludwig@1d8154f

Hello!

I'd like to add S4 into the Ludwig OSS project. I've successfully imported and initialized the S4 module. However, in the forward pass, I am running into the following error:

[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
Traceback (most recent call last):
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1670, in <module>
    main()
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1666, in main
    module(torch.randn(2, 16, 100))
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1582, in forward
    k, k_state = self.kernel(L=L_kernel, rate=rate, state=state)  # (C H L) (B C H L)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1388, in forward
    return self.kernel(state=state, L=L, rate=rate)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 821, in forward
    r = cauchy_conj(v, z, w)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 59, in cauchy_conj
    r = 2 * cauchy_mult(v, z, w, backend="GPU")
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 624, in __call__
    out = GenredAutograd.apply(
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 78, in forward
    myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
KeyError: 'nvrtc'

Here is the code snippet I am currently running from within the file implementing the S4 class:

def main():
    module = S4(
        16,
        gate=4,  # Multiplicative gating layer that also expands dimension by factor of 4
        bottleneck=4,  # Reduce dimension of SSM by factor of 4
        measure="legs",  # Randomly initialize A
        dt_min=1.0,
        dt_max=1.0,  # Initialize dt to 1.0
        lr={"dt": 0.0, "B": 0.0},  # Freeze B and dt
    )
    module(torch.randn(2, 16, 100))


if __name__ == "__main__":
    main()

To reproduce the error:

Download Ludwig on a CPU-only machine. cd into the repository.
Checkout the hack-s4 branch.
Run python ludwig/modules/s4_modules.py.

Let me know what you think, thanks!

opened by geoffreyangus 2

Owner

HazyResearch

We are a CS research group led by Prof. Chris Ré.

GitHub

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

161 Dec 8, 2022

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

22 Nov 25, 2022

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

897 Jan 5, 2023

Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

708 Dec 19, 2022

Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

514 Nov 17, 2022

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

195 Dec 17, 2022

Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

24 Oct 26, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

1.4k Jan 7, 2023

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

269 Jan 5, 2023

Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

3.5k Jan 1, 2023

[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

460 Oct 13, 2022

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

266 Dec 27, 2022

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in OpenAI gym

104 Jan 6, 2023

A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

Neuraxle Pipelines Code Machine Learning Pipelines - The Right Way. Neuraxle is a Machine Learning (ML) library for building machine learning pipeline

555 Dec 24, 2022

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

34 Nov 21, 2022

Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

203 Jan 3, 2023

Sequence Modeling with Structured State Spaces

Related tags

Overview

Structured State Spaces for Sequence Modeling

S4

LSSL

HiPPO

Setup

Requirements

Data

Datasets and Dataloaders

Data

Cauchy Kernel

Custom CUDA Kernel

Pykeops

S4 Experiments

Structured State Space (S4)

Testing

Long Range Arena (LRA)

CIFAR-10

Speech Commands

Training

Hydra

Registries

WandB

Models

SequenceModule

SequenceModel

S4

LSSL

HiPPO

RNNs

Baselines

Overall Repository Structure

Citation

Comments

Patching CVE-2007-4559

Owner

HazyResearch

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sequence to Sequence Models with PyTorch

Sequence-to-Sequence learning using PyTorch

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Sequence modeling benchmarks and temporal convolutional networks

[ICLR'19] Trellis Networks for Sequence Modeling

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Aligning Latent and Image Spaces to Connect the Unconnectable