Sequence Modeling with Structured State Spaces

Overview

Structured State Spaces for Sequence Modeling

This repository provides implementations and experiments for the following papers.

S4

Structured State Spaces

Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, Christopher Ré
Paper: https://arxiv.org/abs/2111.00396

LSSL

Linear State Space Layer

Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer
Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré
Paper: https://arxiv.org/abs/2110.13985

HiPPO

HiPPO Framework

HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, Christopher Ré
Paper: https://arxiv.org/abs/2008.07669

Setup

Requirements

This repository requires Python 3.8+ and Pytorch 1.9+. Other packages are listed in requirements.txt.

Data

Datasets and Dataloaders

All logic for creating and loading datasets is in src/dataloaders. This folders includes many old and experimental datasets. The datasets that we consider core are located in src/dataloaders/datasets.py.

The raw data should be organized as follows. The data path can be configured by the environment variable DATA_PATH, or defaults to ./data by default, where . is the top level directory of this repository (e.g. 'state-spaces').

Data

External datasets include Long Range Arena (LRA), which can be downloaded from their GitHub page.

These external datasets should be organized as follows:

DATA_PATH/
  pathfinder/
    pathfinder32/
    pathfinder64/
    pathfinder128/
    pathfinder256/
  aan/
  listops/

Fine-grained control over the data directory is allowed, e.g. if the LRA ListOps files are located in /home/lra/listops-1000/, you can pass in +dataset.data_dir=/home/lra/listops-1000 on the command line

Cauchy Kernel

A core operation of S4 is the "Cauchy kernel" described in the paper. The implementation of this requires one of two methods:

Custom CUDA Kernel

This version is faster but requires manual compilation on each machine. Run python setup.py install from the directory extensions/cauchy/.

Pykeops

This version is provided by the pykeops library. Installation usually works out of the box with pip install pykeops cmake which are provided in the requirements file.

Note that running in a Colab requires installing a different pip package; instructions can be found in the pykeops documentation.

S4 Experiments

This section describes how to use the latest S4 model and reproduce experiments immediately. More detailed descriptions of the infrastructure are in the subsequent sections.

Structured State Space (S4)

The S4 module is found at src/models/sequence/ss/s4.py.

For users who would like to import a single file that has the self-contained S4 layer, a standalone version can be found at src/models/sequence/ss/standalone/s4.py.

Testing

For testing, we frequently use synthetic datasets or the Permuted MNIST dataset. This can be run with python -m train wandb=null pipeline=mnist model=s4, which should get to around 90% after 1 epoch which takes 2-4 minutes depending on GPU.

Long Range Arena (LRA)

python -m train wandb=null experiment=s4-lra-listops
python -m train wandb=null experiment=s4-lra-imdb
python -m train wandb=null experiment=s4-lra-cifar
python -m train wandb=null experiment=s4-lra-aan
python -m train wandb=null experiment=s4-lra-pathfinder
python -m train wandb=null experiment=s4-lra-pathx

Note that these experiments may take different amounts of time to train. IMDB should take just 1-2 hours, while Path-X will take several epochs to take off and take over a day to train to completion.

CIFAR-10

python -m train wandb=null experiment=s4-cifar

The above command line reproduces our best sequential CIFAR model. Decreasing the model size should yield close results, e.g. halving the hidden dimension with model.d_model=512.

Speech Commands

The Speech Commands dataset we compare against is a modified smaller 10-way classification task.

python -m train wandb=null experiment=s4-sc

To use the original version with the full 35 classes, pass in dataset.all_classes=true

Training

The core training infrastructure of this repository is based on Pytorch-Lightning with a configuration scheme based on Hydra. The structure of this integration largely follows the Lightning+Hydra integration template described in https://github.com/ashleve/lightning-hydra-template.

The main experiment entrypoint is train.py and configs are found in configs/. In brief, the main config is found at configs/config.yaml, which is combined with other sets of configs that can be passed on the command line, to define an overall YAML config. Most config groups define one single Python object (e.g. a PyTorch nn.Module). The end-to-end training pipeline can broken down into the following rough groups, where group XX is found under configs/XX/:

model: the sequence-to-sequence model backbone (e.g. a src.models.sequence.SequenceModel)
dataset: the raw dataset (data/target pairs) (e.g. a pytorch Dataset)
loader: how the data is loaded (e.g. a pytorch DataLoader)
encoder: defines a Module that interfaces between data and model backbone
decoder: defines a Module that interfaces between model backbone and targets
task: specifies loss and metrics

Default combinations of dataset+loader+encoder+decoder+task are further consolidated into groups called pipelines.

A run can be performed by passing in a pipeline config, model config, and any additional arguments modifying the default configurations. A simple example experiment is

python -m train pipeline=mnist dataset.permute=True model=s4 model.n_layers=3 model.d_model=128 model.norm=batch model.prenorm=True wandb=null

This uses the permuted sequential MNIST task and uses an s4 model with a specified number of layers, backbone dimension, and normalization type.

Hydra

It is recommended to read the Hydra documentation to fully understand the configuration framework. For help launching specific experiments, please file an Issue.

Registries

This codebase uses a modification of the hydra instantiate utility that provides shorthand names of different classes, for convenience in configuration and logging. The mapping from shorthand to full path can be found in src/utils/registry.py.

WandB

Logging with WandB is built into this repository. In order to use this, simply set your WANDB_API_KEY environment variable, and change the wandb.project attribute of configs/config.yaml (or pass it on the command line python -m train .... wandb.project=s4).

Set wandb=null to turn off WandB logging.

Models

This repository provides a modular and flexible implementation of sequence models at large.

SequenceModule

SequenceModule src/models/sequence/base.py is the abstract interface that all sequence models adhere to. In this codebase, sequence models are defined as a sequence-to-sequence map of shape (batch size, sequence length, input dimension) to (batch size, sequence length, output dimension).

The SequenceModule comes with other methods such as step which is meant for autoregressive settings, and logic to carry optional hidden states (for stateful models such as RNNs or S4).

SequenceModel

SequenceModel src/models/sequence/model.py is the main backbone with configurable options for residual function, normalization placement and type, etc. SequenceModel accepts a black box config for a layer. Compatible layers are SequenceModules (i.e. composable sequence transformations) found under src/models/sequence/.

S4

This is the main model of this repository. See instructions in Getting Started.

LSSL

The LSSL is an old version of S4. It is currently not recommended for use, but the model can be found at src/models/sequence/ss/lssl.py.

It can be run with model/layer=lssl or model/layer=lssl model.layer.learn=0 for the LSSL-fixed model which does not train A, B, or dt.

HiPPO

HiPPO is the mathematical framework upon which the papers HiPPO, LSSL, and S4 are built on. The logic for HiPPO operators is found under src/models/hippo/.

HiPPO-RNN cells from the original [https://arxiv.org/abs/2008.07669] can be found under the RNN cells

RNNs

This codebase contains a flexible and modular implementation of many RNN cells.

Some examples include model=rnn/hippo-legs and model=rnn/hippo-legt for HiPPO variants from the original paper, or model=rnn/gru for a GRU reimplementation, etc.

An exception is model=lstm to use the PyTorch LSTM.

Example command (reproducing the Permuted MNIST number from the HiPPO paper, which was SotA at the time):

python train.py pipeline=mnist model=rnn/hippo-legs model.cell_args.hidden_size=512 train.epochs=50 train.batch_size=100 train.lr=0.001

Baselines

Other sequence models are easily incorporated into this repository, and several other baselines have been ported.

These include CNNs such as the WaveGAN Discriminator and CKConv and continuous-time/RNN models such as UnICORNN and LipschitzRNN.

python -m train dataset=mnist model={ckconv,unicornn}

Overall Repository Structure

configs/         config files for model, data pipeline, training loop, etc.
data/            default location of raw data
extensions/      CUDA extension for Cauchy kernel
src/             main source code for models, datasets, etc.
train.py         main entrypoint

Citation

If you use this codebase, or otherwise found our work valuable, please cite:

@article{gu2021efficiently,
  title={Efficiently Modeling Long Sequences with Structured State Spaces},
  author={Gu, Albert and Goel, Karan and R{\'e}, Christopher},
  journal={arXiv preprint arXiv:2111.00396},
  year={2021}
}

@article{gu2021combining,
  title={Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers},
  author={Gu, Albert and Johnson, Isys and Goel, Karan and Saab, Khaled and Dao, Tri and Rudra, Atri and R{\'e}, Christopher},
  journal={Advances in neural information processing systems},
  volume={34},
  year={2021}
}

@article{gu2020hippo,
  title={HiPPO: Recurrent Memory with Optimal Polynomial Projections},
  author={Gu, Albert and Dao, Tri and Ermon, Stefano and Rudra, Atri and Re, Christopher},
  journal={Advances in neural information processing systems},
  volume={33},
  year={2020}
}
Comments
  • S4D Memory Requirements

    S4D Memory Requirements

    Hey, I wanted to give S4D a quick try in my research as a drop-in replacement of S4 (which, as far as I gathered, should be a good way to start), but I'm running into some hard memory limitations. I'm trying to train the DiffWave version of SaShiMi as a first experiment, but the memory requirements seem to increase significantly when replacing S4 with an equivalent S4D layer (with default settings), causing the model to go OOM in my case actually (so I don't have any precise measurements, but it's a 20% increase in overall memory consumption at least. I use the parameters as discussed in #46. Is this something you'd expect?

    opened by stefan-baumann 26
  • Model can not converge on the LRA Pathfinder

    Model can not converge on the LRA Pathfinder

    Hi,

    Thanks for the great work! When I ran your code on the LRA pathfinder dataset (using your config), I found it can't converge till the end of the 200th epoch as shown in the following log: loss=0.693, val/accuracy=0.499, val/loss=0.693, test/accuracy=0.495, test/loss=0.693, train/accuracy=0.501, train/loss=0.693. The loss is 0.693 throughout training.

    Do you have any thoughts on this? Thanks!

    opened by violet-zct 19
  • SaShiMi generation script errors out with own models

    SaShiMi generation script errors out with own models

    Hey, first of all, great work with the repository, I don't think I've worked with a repository for a paper that's so extensive and well-structured so far.

    I'm currently trying to train the SaShiMi model on my own dataset (following your guide here: https://github.com/HazyResearch/state-spaces/issues/23), and I run into some issues when trying to generate samples with the trained model. In case this is relevant, I'm trying to do inference on the checkpoint files, and I changed the number of layers (model.n_layers) to 4 to accommodate for the memory limitations of my GPU. Apart from that, I have done no changes to any of the training and model (code) except for switching the dataset to my own. When I try to call the generation.py script now, I run into a range of errors:

    • The config overrides cause some errors, namely the hurwitz parameter does not exist anymore, and the setup_step methods don't seem to correctly accept (or rather pass them downstream) the mode argument. I "fixed" this by removing the hurwitz argument override and by adding the mode argument to all module.setup_step() methods and just passing it downstream as required.
    • Additionally, setting model.layer.postact=null causes the state_dict to not load successfully anymore, giving me the following error:
    Missing key(s) in state_dict: "model.c_layers.0.layer.output_linear.weight", "model.c_layers.0.layer.output_linear.bias", "model.c_layers.2.layer.output_linear.weight", "model.c_layers.2.layer.output_linear.bias", "model.c_layers.4.layer.output_linear.weight", "model.c_layers.4.layer.output_linear.bias", "model.c_layers.6.layer.output_linear.weight", "model.c_layers.6.layer.output_linear.bias", "model.u_layers.0.1.layer.output_linear.weight", "model.u_layers.0.1.layer.output_linear.bias", "model.u_layers.0.3.layer.output_linear.weight", "model.u_layers.0.3.layer.output_linear.bias", "model.u_layers.0.5.layer.output_linear.weight", "model.u_layers.0.5.layer.output_linear.bias", "model.u_layers.0.7.layer.output_linear.weight", "model.u_layers.0.7.layer.output_linear.bias", "model.u_layers.1.1.layer.output_linear.weight", "model.u_layers.1.1.layer.output_linear.bias", "model.u_layers.1.3.layer.output_linear.weight", "model.u_layers.1.3.layer.output_linear.bias", "model.u_layers.1.5.layer.output_linear.weight", "model.u_layers.1.5.layer.output_linear.bias", "model.u_layers.1.7.layer.output_linear.weight", "model.u_layers.1.7.layer.output_linear.bias". 
    Unexpected key(s) in state_dict: "model.c_layers.0.layer.output_linear.0.weight", "model.c_layers.0.layer.output_linear.0.bias", "model.c_layers.2.layer.output_linear.0.weight", "model.c_layers.2.layer.output_linear.0.bias", "model.c_layers.4.layer.output_linear.0.weight", "model.c_layers.4.layer.output_linear.0.bias", "model.c_layers.6.layer.output_linear.0.weight", "model.c_layers.6.layer.output_linear.0.bias", "model.u_layers.0.1.layer.output_linear.0.weight", "model.u_layers.0.1.layer.output_linear.0.bias", "model.u_layers.0.3.layer.output_linear.0.weight", "model.u_layers.0.3.layer.output_linear.0.bias", "model.u_layers.0.5.layer.output_linear.0.weight", "model.u_layers.0.5.layer.output_linear.0.bias", "model.u_layers.0.7.layer.output_linear.0.weight", "model.u_layers.0.7.layer.output_linear.0.bias", "model.u_layers.1.1.layer.output_linear.0.weight", "model.u_layers.1.1.layer.output_linear.0.bias", "model.u_layers.1.3.layer.output_linear.0.weight", "model.u_layers.1.3.layer.output_linear.0.bias", "model.u_layers.1.5.layer.output_linear.0.weight", "model.u_layers.1.5.layer.output_linear.0.bias", "model.u_layers.1.7.layer.output_linear.0.weight", "model.u_layers.1.7.layer.output_linear.0.bias".
    

    Does this mean that I should rename those keys manually (there's a fairly clear correspondence) to make it work after changing the activation?

    • Finally, even when I pass through the mode parameter in module.setup_step(), I still get this error:
    Traceback (most recent call last):
      File "/home/debaumas/state-spaces/sashimi/generation.py", line 192, in main
        module.setup_step(mode='dense')
      File "/home/debaumas/state-spaces/src/models/sequence/ss/kernel.py", line 1038, in setup_step
        self.kernel.setup_step(mode=mode)
      File "/home/debaumas/state-spaces/src/models/sequence/ss/kernel.py", line 515, in setup_step
        dC = torch.linalg.solve(
    torch._C._LinAlgError: linalg.solve: (Batch element 0): The diagonal element 1 is zero, the solve could not be completed because the input matrix is singular.
    

    Do you have any idea what might be causing this and maybe an idea about how to fix/circumvent this?

    It'd be awesome if you could help point me in the right direction with this.

    Best, Stefan

    opened by stefan-baumann 16
  • GPU Out of Memory

    GPU Out of Memory

    I was wondering what parameters I could change to be able to run it on GPU with limited RAM. I tried reducing the layers to 4, which did not help. Also, it seems like batch size is set to 1 by default. I am using 4x TITAN RTX 24GB.

    opened by davidmrau 13
  • Inconsistent results of forward (training) and step (inference)

    Inconsistent results of forward (training) and step (inference)

    Hi, I did a simple test to verify the difference between forward and step (mode="dense") on a single unidirectional S4 layer. Given a random sequence, there difference, the absolute error is around 1e-2 and the square error is around 1e-4. I suspect these results are wrong. My verification follows test_step() in //src/models/sequence/ss/kernel.py. I'd love to know if you have examples that clearly compares their difference. Thanks:)

    opened by cycycycywu 10
  • Multiple `setup_rnn` calls

    Multiple `setup_rnn` calls

    Hey!

    I'm attempting to integrate the Sashimi Backbone into some audio models -- I'd like to train in convolutional mode and run validation inference in RNN mode, but my reading of the code seems to imply that the setup_step call isn't repeatable or reversible (#67 seems to imply this as well).

    In the case that I temporarily want to infer in RNN mode, but then switch back to the convolutional training mode, what's my best option?

    opened by kradonneoh 7
  • Unable to load wt103 checkpoint, size mismatch

    Unable to load wt103 checkpoint, size mismatch

    Hi Albert,

    I tried to load your recently uploaded wikitext-103 checkpoint, but encountered the following error:

    RuntimeError: Error(s) in loading state_dict for SequenceLightningModule: size mismatch for encoder.0.emb_layers.3.weight: copying a param with shape torch.Size([67738, 16]) from checkpoint, the shape in current model is torch.Size([67737, 16]). size mismatch for loss.out_layers_biases.3: copying a param with shape torch.Size([67738]) from checkpoint, the shape in current model is torch.Size([67737]). size mismatch for loss_val.out_layers_biases.3: copying a param with shape torch.Size([67738]) from checkpoint, the shape in current model is torch.Size([67737]).

    Do you know why is it? I used the wt103 data downloaded from the transformer-xl repo: https://github.com/kimiyoung/transformer-xl/blob/master/getdata.sh.

    Thanks!

    opened by violet-zct 6
  • ValueError when running on Pathfinder

    ValueError when running on Pathfinder

    Hi, I am getting the following error when trying to train S4 on the pathfinder dataset. Any help would be greatly appreciated.

    Traceback (most recent call last): File "/data/al451/state-spaces/train.py", line 553, in main train(config) File "/data/al451/state-spaces/train.py", line 498, in train trainer.fit(model) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit self._call_and_handle_interrupt( File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1172, in _run self._call_setup_hook() # allow user to setup lightning_module in accelerator environment File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1492, in _call_setup_hook self._call_lightning_module_hook("setup", stage=fn) File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/data/al451/state-spaces/train.py", line 56, in setup self.dataset.setup() File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1234, in setup dataset = PathFinderDataset(self.data_dir, transform=self.default_transforms()) File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1130, in init path_list = sorted( File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1132, in key=lambda path: int(path.stem), ValueError: invalid literal for int() with base 10: '._142'

    Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

    opened by andrewliu2001 5
  • Experiment reproduction issue with updated modules

    Experiment reproduction issue with updated modules

    Hi, I was trying to reproduce some of your results using the SaShiMi model by running the command

    python -m train experiment=sashimi-sc09 wandb=null
    

    but I get the error

    TypeError: __init__() got an unexpected keyword argument 'pool'
    

    due to the DownPool class no longer needing pool parameter for initialization.

    Can I ask if there are any plans to fix these issues so that they work with the current implementations of the different modules?

    opened by jhuang265 5
  • Unused parameters in training

    Unused parameters in training

    Hi! I'm running some experiments using your code. For my use-case, I'm using torch.nn.DistributedDataParallel, which automatically detects unused parameters, i.e., parameters that get no gradients.

    The unused parameters are:

    • D (from the S4 module)
    • output_linear.weight and output_linear.bias (from the S4 module). These are instances of the TransposedLinear layer.
    • kernel.C (from SSKernelNPLR).

    I have manually confirmed these parameters don't get gradients by running the following code after computing the loss:

    for name, param in model.named_parameters():
        if param.grad is None:
            print(name)
    

    Usually, the above means the parameters are instantiated but not used. In this case, surprisingly, all the parameters get used in the forward method. However, none of them get used in "vanilla" PyTorch ops. D, output_linear.weight and output_linear.bias get used through opt_einsum.contract, and kernel.C gets used through your Cauchy GPU op.

    Can you confirm the issue on your end? These parameters all look important for the model.

    opened by tjppires 5
  • can't run train.py w/o compiling Cauchy kernel for CUDA

    can't run train.py w/o compiling Cauchy kernel for CUDA

    Dear all,

    I am having trouble compiling the Cauchy kernel, and although I have installed pykeops, running train.py always results in errors like this:

    _RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support:

    1. to perform computations on CPU, simply set tagHostDevice to 0
    2. to perform computations on GPU, please recompile the formula with a working version of cuda._

    The only thing that fixed the issues for me is commenting out the following try/catch. Without that (sorry for its uglyness...) the code never did default back to the slow kernel... now it does, but that is certainly not the right way for me to go about it ;)

    I wonder if the try/catch-phrase needs to check whether the kernel actually runs, not just lets itself be imported?

    ''' try: import pykeops from src.models.functional.cauchy import cauchy_conj has_pykeops = True except ImportError: has_pykeops = False from src.models.functional.cauchy import cauchy_conj_slow if not has_cauchy_extension: log.error( "Falling back on slow Cauchy kernel. Install at least one of pykeops or the CUDA extension for efficiency." ) ''' has_pykeops = False from src.models.functional.cauchy import cauchy_conj_slow if not has_cauchy_extension: log.error( "Falling back on slow Cauchy kernel. Install at least one of pykeops or the CUDA extension for efficiency." )

    opened by DorotheaKolossa 5
  • Interchangeability of cauchy kernel methods

    Interchangeability of cauchy kernel methods

    Hi Albert,

    I've been training an image generation model using the S4 module and the cauchy extension (compiled code on local machine). Within a model trained with the cauchy extension, would you expect performance differences if the naive implementation (slow kernel) was used for evaluation? Or is the naive implementation less robust than the extension?

    My goal is to visually inspect a few things, but am experiencing problems using the extension in a jupyter notebook (even when placing all tensors/models on a GPU).

    Thanks again for your time, Tommy

    opened by tlasmanbotch 0
  • Training on midi

    Training on midi

    I want to train a S4 model on midi, which I already have as discrete events, for example in the form of .csv. How do I create a dataset with which I can train?
    In other words, how can I create a dataset consisting of three columns in a .csv files or how can I train such files with python -m train? I would be very grateful to get an answer as soon as possible.

    This is what the csv data looks like: 960 | 70 | 0 0 | 70 | 74 120 | 70 | 0 120 | 70 | 70 120 | 70 | 0

    opened by ep-oi 3
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Understanding the moving parts for training

    Understanding the moving parts for training

    Hi team!

    The paper and the accompanying codebase are really great! We are trying to use S4 for a different problem and there are a lot of engineering details that seem to be affecting the training:

    1. Adding/removing GLU activation => Also affects ListOps a lot
    2. Dropout vs Dropout2D
    3. Learning rate of the state space parameters and their schedule in relation to rest of parameters
    4. Whether to share the diagonal and low-rank params depth-wise or not
    5. Whether to share log_step depth-wise or not => Leads to NaN loss at times
    6. Whether to use the NPLR formulation from S4 paper or Sashimi
    7. Whether to use S4 or S4D or DSS
    8. Bidirectional or unidirectional ...

    From your experience, could you share any ideas on how to choose from these options? Also, could you list any other important details that might affect training?

    opened by nbgundavarapu 1
  • Getting `KeyError: 'nvrtc'` on CPU-only machine

    Getting `KeyError: 'nvrtc'` on CPU-only machine

    Versions:

    • Python 3.9.14
    • macOS 12.6
    • state-spaces@292984c
    • ludwig@1d8154f

    Hello!

    I'd like to add S4 into the Ludwig OSS project. I've successfully imported and initialized the S4 module. However, in the forward pass, I am running into the following error:

    [KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
    Traceback (most recent call last):
      File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1670, in <module>
        main()
      File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1666, in main
        module(torch.randn(2, 16, 100))
      File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1582, in forward
        k, k_state = self.kernel(L=L_kernel, rate=rate, state=state)  # (C H L) (B C H L)
      File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1388, in forward
        return self.kernel(state=state, L=L, rate=rate)
      File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 821, in forward
        r = cauchy_conj(v, z, w)
      File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 59, in cauchy_conj
        r = 2 * cauchy_mult(v, z, w, backend="GPU")
      File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 624, in __call__
        out = GenredAutograd.apply(
      File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 78, in forward
        myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
    KeyError: 'nvrtc'
    

    Here is the code snippet I am currently running from within the file implementing the S4 class:

    def main():
        module = S4(
            16,
            gate=4,  # Multiplicative gating layer that also expands dimension by factor of 4
            bottleneck=4,  # Reduce dimension of SSM by factor of 4
            measure="legs",  # Randomly initialize A
            dt_min=1.0,
            dt_max=1.0,  # Initialize dt to 1.0
            lr={"dt": 0.0, "B": 0.0},  # Freeze B and dt
        )
        module(torch.randn(2, 16, 100))
    
    
    if __name__ == "__main__":
        main()
    

    To reproduce the error:

    1. Download Ludwig on a CPU-only machine. cd into the repository.
    2. Checkout the hack-s4 branch.
    3. Run python ludwig/modules/s4_modules.py.

    Let me know what you think, thanks!

    opened by geoffreyangus 2
Owner
HazyResearch
We are a CS research group led by Prof. Chris Ré.
HazyResearch
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1.1k Dec 27, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 11.3k Feb 18, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 986 Feb 17, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1000 Apr 19, 2021
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 13.2k Jul 7, 2021
Sequence-to-Sequence Framework in PyTorch

nmtpytorch allows training of various end-to-end neural architectures including but not limited to neural machine translation, image captioning and au

LIUM 395 Nov 21, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 3, 2021
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 3, 2023
Concept Modeling: Topic Modeling on Images and Text

Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.

Maarten Grootendorst 120 Dec 27, 2022
ETM - R package for Topic Modelling in Embedding Spaces

ETM - R package for Topic Modelling in Embedding Spaces This repository contains an R package called topicmodels.etm which is an implementation of ETM

bnosac 37 Nov 6, 2022
Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

Shuai Zhang 4 Dec 1, 2021
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Dec 16, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022