PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Overview

Quasi-Recurrent Neural Network (QRNN) for PyTorch

Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py example.

This repository contains a PyTorch implementation of Salesforce Research's Quasi-Recurrent Neural Networks paper.

The QRNN provides similar accuracy to the LSTM but can be betwen 2 and 17 times faster than the highly optimized NVIDIA cuDNN LSTM implementation depending on the use case.

To install, simply run:

pip install cupy pynvrtc git+https://github.com/salesforce/pytorch-qrnn

If you use this code or our results in your research, please cite:

@article{bradbury2016quasi,
  title={{Quasi-Recurrent Neural Networks}},
  author={Bradbury, James and Merity, Stephen and Xiong, Caiming and Socher, Richard},
  journal={International Conference on Learning Representations (ICLR 2017)},
  year={2017}
}

Software Requirements

This codebase requires Python 3, PyTorch, pynvrtc (NVIDIA's Python Bindings to NVRTC), and CuPy. While the codebase contains a CPU implementation of the QRNN, the GPU QRNN implementation is used by default if possible. Requirements are provided in requirements.txt.

Example Usage

We've updated the previously released Salesforce Research AWD-LSTM language modeling codebase to support use of the AWD-QRNN. With the same number of parameters as the LSTM and less well tuned hyper parameters, the QRNN model trains over twice as quickly and achieves nearly equivalent state-of-the-art language modeling results. For full details, refer to the AWD-LSTM-LM repository.

Usage

The QRNN API is meant to be drop-in compatible with the LSTM for many standard use cases. As such, the easiest thing to do is replace any GRU or LSTM module with the QRNN.

Note: bidirectional QRNN is not yet supported though will be in the near future.

import torch
from torchqrnn import QRNN

seq_len, batch_size, hidden_size = 7, 20, 256
size = (seq_len, batch_size, hidden_size)
X = torch.autograd.Variable(torch.rand(size), requires_grad=True).cuda()

qrnn = QRNN(hidden_size, hidden_size, num_layers=2, dropout=0.4)
qrnn.cuda()
output, hidden = qrnn(X)

print(output.size(), hidden.size())

The full documentation for the QRNN is listed below:

QRNN(input_size, hidden_size, num_layers, dropout=0):
    Applies a multiple layer Quasi-Recurrent Neural Network (QRNN) to an input sequence.

    Args:
        input_size: The number of expected features in the input x.
        hidden_size: The number of features in the hidden state h. If not specified, the input size is used.
        num_layers: The number of QRNN layers to produce.
        layers: List of preconstructed QRNN layers to use for the QRNN module (optional).
        save_prev_x: Whether to store previous inputs for use in future convolutional windows (i.e. for a continuing sequence such as in language modeling). If true, you must call reset to remove cached previous values of x. Default: False.
        window: Defines the size of the convolutional window (how many previous tokens to look when computing the QRNN values). Supports 1 and 2. Default: 1.
        zoneout: Whether to apply zoneout (i.e. failing to update elements in the hidden state) to the hidden state updates. Default: 0.
        output_gate: If True, performs QRNN-fo (applying an output gate to the output). If False, performs QRNN-f. Default: True.
        use_cuda: If True, uses fast custom CUDA kernel. If False, uses naive for loop. Default: True.

    Inputs: X, hidden
        - X (seq_len, batch, input_size): tensor containing the features of the input sequence.
        - hidden (layers, batch, hidden_size): tensor containing the initial hidden state for the QRNN.

    Outputs: output, h_n
        - output (seq_len, batch, hidden_size): tensor containing the output of the QRNN for each timestep.
        - h_n (layers, batch, hidden_size): tensor containing the hidden state for t=seq_len

The included QRNN layer supports convolutional windows of size 1 or 2 but will be extended in the future to support arbitrary convolutions.

If you are using convolutional windows of size 2 (i.e. looking at the inputs from two previous timesteps to compute the input) and want to run over a long sequence in batches, such as when using BPTT, you can set save_prev_x=True and call reset when you wish to reset the cached previous inputs.

If you want flexibility in the definition of each QRNN layer, you can construct individual QRNNLayer modules and pass them to the QRNN module using the layer argument.

Speed

Speeds are between 2 and 17 times faster than NVIDIA's cuDNN LSTM, with the difference as a result of varying batch size and sequence length. The largest gains are for small batch sizes or long sequence lengths, both highlighting the LSTMs parallelization difficulty due to forced sequentiality. For full information, refer to the Quasi-Recurrent Neural Networks paper.

Figure 4 from QRNN paper

Pictured above is Figure 4 from the QRNN paper:
Left: Training speed for two-layer 640-unit PTB LM on a batch of 20 examples of 105 timesteps. “RNN” and “softmax” include the forward and backward times, while “optimization overhead” includes gradient clipping, L2 regularization, and SGD computations.
Right: Inference speed advantage of a 320-unit QRNN layer alone over an equal-sized cuDNN LSTM layer for data with the given batch size and sequence length. Training results are similar.

Extending the QRNN speed advantage to other recurrent architectures with ForgetMult

The QRNN architecture's speed advantage comes from two primary sources: the ability to batch all computations into a few large matrix multiplications and the use of a fast element-wise recurrence function. This recurrence function, named ForgetMult, is general and can be used in other scenarios. The ForgetMult takes two arguments - the candidate input x and forget gates f - and computes h = f * x + (1 - f) * hm1 where hm1 is the previous hidden state output.

The QRNN class is a thin wrapper around this that performs the large matrix multiplications for the candidate x, the forget gates f, and the output gates o. Any other operation which requires recurrence and can have precomputed values for the candidate x and forget gates f can use this fast form of recurrence.

Example usage of the ForgetMult module: output = ForgetMult()(f, x, hidden).

    ForgetMult computes a simple recurrent equation:
    h_t = f_t * x_t + (1 - f_t) * h_{t-1}

    This equation is equivalent to dynamic weighted averaging.

    Inputs: X, hidden
        - X (seq_len, batch, input_size): tensor containing the features of the input sequence.
        - F (seq_len, batch, input_size): tensor containing the forget gate values, assumed in range [0, 1].
        - hidden_init (batch, input_size): tensor containing the initial hidden state for the recurrence (h_{t-1}).
        - cuda: If True, use the fast element-wise CUDA kernel for recurrence. If False, uses naive for loop. Default: True.

Want to help out?

First, thanks! :)

Open tasks that are interesting:

  • Modify the ForgetMult CUDA kernel to produce a BackwardForgetMult. This will enable a bidirectional QRNN. The input should be the same - f and x - but the kernel should walk backwards through the inputs.
  • Bidirectional QRNN support (requires the modification above)
  • Support PyTorch's PackedSequence such that variable length sequences are correctly masked
  • Show how to use the underlying fast recurrence operator ForgetMult in other generic ways
Comments
  • Multi-GPU [Torch DataParallel]

    Multi-GPU [Torch DataParallel]

    Could you guys get it to work with torch.nn.DataParallel(model).cuda()? I could not, but perhaps did not try hard enough. Can't tell if it's a wrong-GPU problem, or CuPy won't support it.

    Runs pretty fast on 1x GPUs though. A bit faster than 4x GPUs for vanilla LSTM, but not by much without scaling to multiple GPUs...

    opened by moscow25 7
  • Error in executing QRNN

    Error in executing QRNN

    Hello, I got error when I run my modified code from example.

    import time
    
    import numpy as np
    
    import torch
    import torch.nn as nn
    
    import torchqrnn.forget_mult
    from torchqrnn import QRNN
    
    class Model(nn.Module):
    
        def __init__(self, hidden_size=1024, layers=3, vocab=100):
            super(Model, self).__init__()
    
            self.embedding = nn.Embedding(vocab, hidden_size)
    
            self.rnn = QRNN(hidden_size, hidden_size, num_layers=layers)
    
        def forward(self, x):
            x = self.embedding(x)
            out, hidden = self.rnn(x)
            return out[:-1]
    
    H = 256
    SEQ = 100
    BATCH = 64
    
    H = 1024
    SEQ = 500
    BATCH = 128
    
    LOOPS = 500
    
    np.random.seed(42)
    torch.manual_seed(42)
    torch.cuda.manual_seed(42)
    
    x = torch.autograd.Variable(torch.LongTensor(np.random.randint(0, 100, [BATCH, SEQ])))
    x = x.cuda()
    
    np.random.seed(42)
    torch.manual_seed(42)
    torch.cuda.manual_seed(42)
    
    model = Model(H)
    model = model.cuda()
    model(x)
    

    The error message is:

    Traceback (most recent call last):
      File "train.py", line 42, in <module>
        model(x)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "train.py", line 22, in forward
        out, hidden = self.rnn(x)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torchqrnn/qrnn.py", line 160, in forward
        input, hn = layer(input, None if hidden is None else hidden[i])
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torchqrnn/qrnn.py", line 95, in forward
        C = ForgetMult()(F, Z, hidden, use_cuda=self.use_cuda)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torchqrnn/forget_mult.py", line 175, in forward
        if hidden_init is None: return GPUForgetMult()(f, x) if use_cuda else CPUForgetMult()(f, x)
      File "/home/michael/.virtualenvs/fyp-pytorch/local/lib/python2.7/site-packages/torchqrnn/forget_mult.py", line 127, in forward
        self.forget_mult(grid=grid, block=(grid_hidden_size, 1), args=[result.data_ptr(), f.data_ptr(), x.data_ptr(), seq_size, batch_size, hidden_size], stream=self.stream)
      File "cupy/cuda/function.pyx", line 143, in cupy.cuda.function.Function.__call__
    TypeError: 'float' object cannot be interpreted as an index
    

    I am using CUDA 8.0 and python 2.7. Thanks.

    opened by michjk 3
  • Bad squeeze in CPUForgetMult

    Bad squeeze in CPUForgetMult

    Hi,

    It looks like I've encountered a lil bug when batch_size=1 at CPU inference ( haven't checked on GPU yet ). I've found that, whilst forwarding in CPUForgetMult, there is a general squeeze for all dimensions when appending each h to the resulting list of tensors, concretely:

    result.append(h.squeeze())
    

    It turns out the size of h at each iteration is (1, batch_size, feats), so when we squeeze with batch_size=1 the resulting tensor is of size (feats,), resulting in a final stack torch.stack(result) of size (seq_len, feats). This will cause an error when, in QRNN forward, we do C[-1:, :, :] trying to access every sample in batch dimension (i.e. 1) which does not exist because of the squeeze. We can just specify the specific squeeze dimension to be 0 (in batch_first=False option, which is the only one available atm).

    opened by santi-pdp 2
  • modify default value of hidden_size

    modify default value of hidden_size

    hidden_size: The number of features in the hidden state h. If not specified, the input size is used.

    It seems that current code has a small mistake.

    opened by gyu-don 2
  • [WinError 126] The specified module could not be found - Any idea of the error source?

    [WinError 126] The specified module could not be found - Any idea of the error source?

    Hi there,

    I am trying to replace my LSTM architecture with your interesting QRNN. Following your readme file, everything is installed successfully on my machine. However, while running the example provided, I keep getting this issue. Any idea of the reason?

    ============================================================

    OSError Traceback (most recent call last) in 8 qrnn = QRNN(hidden_size, hidden_size, num_layers=2, dropout=0.4) 9 qrnn.cuda() ---> 10 output, hidden = qrnn(X) 11 12 print(output.size(), hidden.size())

    ~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result)

    ~\Anaconda3\lib\site-packages\torchqrnn\qrnn.py in forward(self, input, hidden) 162 163 for i, layer in enumerate(self.layers): --> 164 input, hn = layer(input, None if hidden is None else hidden[i]) 165 next_hidden.append(hn) 166

    ~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result)

    ~\Anaconda3\lib\site-packages\torchqrnn\qrnn.py in forward(self, X, hidden) 97 # Forget Mult 98 # For testing QRNN without ForgetMult CUDA kernel, C = Z * F may be useful ---> 99 C = ForgetMult()(F, Z, hidden, use_cuda=self.use_cuda) 100 101 # Apply (potentially optional) output gate

    ~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs) 545 result = self._slow_forward(*input, **kwargs) 546 else: --> 547 result = self.forward(*input, **kwargs) 548 for hook in self._forward_hooks.values(): 549 hook_result = hook(self, input, result)

    ~\Anaconda3\lib\site-packages\torchqrnn\forget_mult.py in forward(self, f, x, hidden_init, use_cuda) 176 ### 177 # Avoiding 'RuntimeError: expected a Variable argument, but got NoneType' when hidden_init is None --> 178 if hidden_init is None: return GPUForgetMult()(f, x) if use_cuda else CPUForgetMult()(f, x) 179 return GPUForgetMult()(f, x, hidden_init) if use_cuda else CPUForgetMult()(f, x, hidden_init) 180

    ~\Anaconda3\lib\site-packages\torchqrnn\forget_mult.py in forward(self, f, x, hidden_init) 118 119 def forward(self, f, x, hidden_init=None): --> 120 self.compile() 121 seq_size, batch_size, hidden_size = f.size() 122 result = f.new(seq_size + 1, batch_size, hidden_size)

    ~\Anaconda3\lib\site-packages\torchqrnn\forget_mult.py in compile(self) 100 def compile(self): 101 if self.ptx is None: --> 102 program = Program(kernel.encode(), 'recurrent_forget_mult.cu'.encode()) 103 GPUForgetMult.ptx = program.compile() 104

    ~\Anaconda3\lib\site-packages\pynvrtc\compiler.py in init(self, src, name, headers, include_names, lib_name) 47 headers=[], include_names=[], 48 lib_name=''): ---> 49 self._interface = NVRTCInterface(lib_name) 50 self._program = self._interface.nvrtcCreateProgram(src, name, 51 headers,

    ~\Anaconda3\lib\site-packages\pynvrtc\interface.py in init(self, lib_path) 85 def init(self, lib_path=''): 86 self._lib = None ---> 87 self._load_nvrtc_lib(lib_path) 88 89 def _load_nvrtc_lib(self, lib_path):

    ~\Anaconda3\lib\site-packages\pynvrtc\interface.py in _load_nvrtc_lib(self, lib_path) 107 name = lib_path 108 --> 109 self._lib = cdll.LoadLibrary(name) 110 111 self._lib.nvrtcCreateProgram.argtypes = [

    ~\Anaconda3\lib\ctypes_init_.py in LoadLibrary(self, name) 424 425 def LoadLibrary(self, name): --> 426 return self._dlltype(name) 427 428 cdll = LibraryLoader(CDLL)

    ~\Anaconda3\lib\ctypes_init_.py in init(self, name, mode, handle, use_errno, use_last_error) 346 347 if handle is None: --> 348 self._handle = _dlopen(self._name, mode) 349 else: 350 self._handle = handle

    OSError: [WinError 126] The specified module could not be found

    opened by issararab 1
  • Problem with QRNN num_layers=2, layers=None, and input_size != hidden_size

    Problem with QRNN num_layers=2, layers=None, and input_size != hidden_size

    This code will not work:

    import torch
    from torchqrnn import QRNN
    
    seq_len, batch_size, hidden_size = 7, 20, 256
    size = (seq_len, batch_size, 32)
    X = torch.autograd.Variable(torch.rand(size), requires_grad=True).cuda()
    print(X.size())
    
    qrnn = QRNN(32, hidden_size, num_layers=2, dropout=0.4)
    qrnn.cuda()
    output, hidden = qrnn(X)
    
    print(output.size(), hidden.size())
    

    I think the problem is caused by this line:

    https://github.com/salesforce/pytorch-qrnn/blob/b64688071e947451ac0dab2d1ac9ef73673406f8/torchqrnn/qrnn.py#L142

    Aren't the initialization parameters supposed to be QRNNLayer(hidden_size, hidden_size, **kwargs) starting from the second layer?

    opened by ceshine 1
  • Add MultiGPU / DataParallel support for ForgetMult and hence QRNN

    Add MultiGPU / DataParallel support for ForgetMult and hence QRNN

    This pull request allows ForgetMult and hence QRNN to handle multiple GPUs using the DataParallel wrapper.

    The code appears to be working cleanly but I want to run a few more sanity checks before merging in.

    opened by Smerity 1
  • Legacy autograd Runtime error

    Legacy autograd Runtime error

    Hi all,

    I am currently doing a small project on image captioning. I came across QRNN and thought of replacing LSTM with QRNN. Everything was working fine with LSTM with longer training times but as soon as I replace LSTM with QRNN, I am getting this error.

    Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

    Even on running the sample code provided on this repo I am getting the same above error. import torch from torchqrnn import QRNN

    seq_len, batch_size, hidden_size = 7, 20, 256 size = (seq_len, batch_size, hidden_size) X = torch.autograd.Variable(torch.rand(size), requires_grad=True).cuda()

    qrnn = QRNN(hidden_size, hidden_size, num_layers=2, dropout=0.4) qrnn.cuda() output, hidden = qrnn(X)

    print(output.size(), hidden.size())

    RUntime error:Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

    Please tell me how to get rid of this error. Thanks

    opened by vaib-saxena 2
  • AttributeError: 'bytes' object has no attribute 'encode'

    AttributeError: 'bytes' object has no attribute 'encode'

    AttributeError Traceback (most recent call last)

    in 49 model.train() 50 optimizer.zero_grad() ---> 51 classes = model(data) 52 loss = focal_loss(classes, focal_label) + margin_loss(classes, margin_label) 53 loss.backward()

    ~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result)

    in forward(self, batch) 63 # x = torch.nn.utils.rnn.pack_padded_sequence(x, lengths) 64 # self.rnn.flatten_parameters() ---> 65 x, _ = self.rnn(x) 66 # x, _ = torch.nn.utils.rnn.pad_packed_sequence(x) 67 x = self.dropout(x)

    ~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result)

    ~/.local/lib/python3.6/site-packages/torchqrnn/qrnn.py in forward(self, input, hidden) 162 163 for i, layer in enumerate(self.layers): --> 164 input, hn = layer(input, None if hidden is None else hidden[i]) 165 next_hidden.append(hn) 166

    ~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result)

    ~/.local/lib/python3.6/site-packages/torchqrnn/qrnn.py in forward(self, X, hidden) 97 # Forget Mult 98 # For testing QRNN without ForgetMult CUDA kernel, C = Z * F may be useful ---> 99 C = ForgetMult()(F, Z, hidden, use_cuda=self.use_cuda) 100 101 # Apply (potentially optional) output gate

    ~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result)

    ~/.local/lib/python3.6/site-packages/torchqrnn/forget_mult.py in forward(self, f, x, hidden_init, use_cuda) 176 ### 177 # Avoiding 'RuntimeError: expected a Variable argument, but got NoneType' when hidden_init is None --> 178 if hidden_init is None: return GPUForgetMult()(f, x) if use_cuda else CPUForgetMult()(f, x) 179 return GPUForgetMult()(f, x, hidden_init) if use_cuda else CPUForgetMult()(f, x, hidden_init) 180

    ~/.local/lib/python3.6/site-packages/torchqrnn/forget_mult.py in forward(self, f, x, hidden_init) 118 119 def forward(self, f, x, hidden_init=None): --> 120 self.compile() 121 seq_size, batch_size, hidden_size = f.size() 122 result = f.new(seq_size + 1, batch_size, hidden_size)

    ~/.local/lib/python3.6/site-packages/torchqrnn/forget_mult.py in compile(self) 100 def compile(self): 101 if self.ptx is None: --> 102 program = Program(kernel.encode(), 'recurrent_forget_mult.cu'.encode()) 103 GPUForgetMult.ptx = program.compile() 104

    ~/.local/lib/python3.6/site-packages/pynvrtc/compiler.py in init(self, src, name, headers, include_names, lib_name) 50 self._program = self._interface.nvrtcCreateProgram(src, name, 51 headers, ---> 52 include_names) 53 54 def del(self):

    ~/.local/lib/python3.6/site-packages/pynvrtc/interface.py in nvrtcCreateProgram(self, src, name, headers, include_names) 198 include_names_array[:] = encode_str_list(include_names) 199 code = self._lib.nvrtcCreateProgram(byref(res), --> 200 c_char_p(encode_str(src)), c_char_p(encode_str(name)), 201 len(headers), 202 headers_array, include_names_array)

    ~/.local/lib/python3.6/site-packages/pynvrtc/interface.py in encode_str(s) 52 if is_python2: 53 return s ---> 54 return s.encode("utf-8") 55 56

    AttributeError: 'bytes' object has no attribute 'encode'

    opened by cy69855522 4
  • Fixes wrong parameter call with the latest version of pynvrtc

    Fixes wrong parameter call with the latest version of pynvrtc

    Leaving this fix here in case someone tries to run it with python3 and recent librairies. The error I encountered was 'Program' object has no attribute '_program', which was caused by the latest version of pynvrtc accepting now String instead of bytes.

    More details:
    https://github.com/jonas-koehler/s2cnn/issues/21#issuecomment-409488734

    cla:signed 
    opened by nkcr 0
  • Backward ForgetMult + Bidirectional QRNN

    Backward ForgetMult + Bidirectional QRNN

    I had some time on the weekend and thought this might be fun. The PR implements a backward ForgetMult for CPU and CUDA and changes to QRNN and QRNNLayer to make them bidirectional. I tried to keep changes to the original code minimal without duplicating a lot of code.

    I tested this with Pytorch 0.4.0 and Python 3.6.5. Gradient checks etc. pass. On preliminary results with IMDB movie reviews it looks like a bidirectional QRNN (2 layers, each with forward and backward, 256 hidden units) performs slightly better (~0.5% accuracy) than a unidirectional QRNN of same size (4 layers, 256 hidden units), but I haven't had enough time to finish experiments to be certain on this.

    Let me know what you think and where the code still needs changes (I know it's not perfect in some places, especially QRNNLayer).

    cla:signed 
    opened by elmarhaussmann 3
Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Stock Price Prediction of Apple Inc. Using Recurrent Neural Network OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network Dataset:

Nouroz Rahman 410 Jan 5, 2023
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
A faster pytorch implementation of faster r-cnn

A Faster Pytorch Implementation of Faster R-CNN Write at the beginning [05/29/2020] This repo was initaited about two years ago, developed as the firs

Jianwei Yang 7.1k Jan 1, 2023
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 3, 2023
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 229 Jan 2, 2023
Official implementation of Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D usi

Visual Intelligence and Systems Group 441 Dec 20, 2022
Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

VariationalRecurrentNeuralNetwork Pytorch implementation of the Variational RNN (VRNN), from A Recurrent Latent Variable Model for Sequential Data. Th

emmanuel 251 Dec 17, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

ETH VIS Research Group 327 Dec 27, 2022
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

null 2 Sep 20, 2022
PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

hierarchical-multi-label-text-classification-pytorch Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach This

Mingu Kang 17 Dec 13, 2022
An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

relational-rnn-pytorch An implementation of DeepMind's Relational Recurrent Neural Networks (Santoro et al. 2018) in PyTorch. Relational Memory Core (

Sang-gil Lee 241 Nov 18, 2022
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 9, 2022
A real world application of a Recurrent Neural Network on a binary classification of time series data

What is this This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data

Josep Maria Salvia Hornos 2 Jan 30, 2022
Space Time Recurrent Memory Network - Pytorch

Space Time Recurrent Memory Network - Pytorch (wip) Implementation of Space Time Recurrent Memory Network, recurrent network competitive with attentio

Phil Wang 50 Nov 7, 2021
Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning The predictive learning of spatiotemporal sequences aims to generate future

THUML: Machine Learning Group @ THSS 243 Dec 26, 2022
Tree LSTM implementation in PyTorch

Tree-Structured Long Short-Term Memory Networks This is a PyTorch implementation of Tree-LSTM as described in the paper Improved Semantic Representati

Riddhiman Dasgupta 529 Dec 10, 2022
Using LSTM to detect spoofing attacks in an Air-Ground network

Using LSTM to detect spoofing attacks in an Air-Ground network Specifications IDE: Spider Packages: Tensorflow 2.1.0 Keras NumPy Scikit-learn Matplotl

Tiep M. H. 1 Nov 20, 2021
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022