Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

ASAPP Research

Last update: Jan 1, 2023

Related tags

Overview

News

SRU++, a new SRU variant, is released. [tech report] [blog]

The experimental code and SRU++ implementation are available on the dev branch which will be merged into master later.

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.

Average processing time of LSTM, conv2d and SRU, tested on GTX 1070

For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.

Reference:

Simple Recurrent Units for Highly Parallelizable Recurrence [paper]

@inproceedings{lei2018sru,
  title={Simple Recurrent Units for Highly Parallelizable Recurrence},
  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [paper]

@article{lei2021srupp,
  title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},
  author={Tao Lei},
  journal={arXiv preprint arXiv:2102.12459},
  year={2021}
}

Requirements

PyTorch >=1.6 recommended
ninja

Install requirements via pip install -r requirements.txt.

Installation

From source:

SRU can be installed as a regular package via python setup.py install or pip install ..

From PyPi:

pip install sru

Directly use the source without installation:

Make sure this repo and CUDA library can be found by the system, e.g.

export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from sru import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = torch.FloatTensor(20, 32, 128).cuda()

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    bidirectional = False,   # bidirectional RNN
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = -2,        # initial bias of highway gate (<= 0)
)
rnn.cuda()

output_states, c_states = rnn(x)      # forward pass

# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)

Contributing

Please read and follow the guidelines.

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the first CPU version.

Comments

Enable both Pytorch native AMP and Nvidia APEX AMP for SRU
Hi!

I was happily using SRUs with Pytorch native AMP, however I started experimenting with training using Microsoft DeepSpeed and bumped in to an issue.

Basically the issues is that I observed that FP16 training using DeepSpeed doesn't work for both GRUs and SRUs. However when using Nvidia APEX AMP, DeepSpeed training using GRUs does work.

So, based on the tips in one of your issues, I started looking in to how I could enable Pytorch native AMP and Nvidia APEX AMP for SRUs, so I could train models based on SRUs using DeepSpeed.

That is why I created this pull request. Basically, I found that by making the code simpler, I can make SRUs work with both methods of AMP.

Now amp_recurrence_fp16 can be used for both types of AMP. When amp_recurrence_fp16=True, the tensor's are cast to float16, otherwise nothing special happens. So, I also removed the torch.cuda.amp.autocast(enabled=False) region; I might be wrong, but it seems that we don't need it.

I did some tests with my own code and it works in the different scenarios of interest:

Using PyTorch native AMP, not using DeepSpeed

Not using PyTorch native AMP, not using DeepSpeed

Using Nvidia APEX AMP, using DeepSpeed

Not using Nvidia APEX AMP, using DeepSpeed

It would be beneficial if we can test this with an official SRU repo test, maybe repurposing the language_model/train_lm.py?
opened by visionscaper 13
float16 handling

When I convert my model, which using this SRU unit, into float16 enabled one, it fails. Is this SRU not implemented to use in float16 environment, or is it hard to fix it?
bug

opened by ywatanabe1989 11
support GPU inference in torchscript
This is on 3.0.0-dev branch for now

A non-trivial PR to support GPU inference in torchscript

Load CUDA kernels as non-python modules; this is needed for torchscript compilation

Refactored CUDA APIs as functions that return output as tensors, instead of procedures that modify some passed-in tensors.

Added a workaround in case TS tries to locate and compile CUDA methods on machines that don't have CUDA / GPUs

The refactored code has passed the forward() & backward() test. I also checked the outputs are the same for the non-torchscript and torchscript versions of the same model.
opened by taoleicn 8

Error unpacking PackedSequence on latest version

Hello @taolei87 , After updating to the latest version, my code broke. It works great on the previous 2.3.5 version and with nn.LSTM.

File "C:\xxx\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
  result = self.forward(*input, **kwargs)
File "C:\xxx\lib\site-packages\sru\modules.py", line 576, in forward
  mask_pad = (mask_pad >= batch_sizes.view(length, 1)).contiguous()
RuntimeError: shape '[393, 1]' is invalid for input of size 384

I can see that in the previous version the unpacking code on forward was different:

        input_packed = isinstance(input, nn.utils.rnn.PackedSequence)
        if input_packed:
            input, lengths = nn.utils.rnn.pad_packed_sequence(input)
            max_length = lengths.max().item()
            mask_pad = torch.ByteTensor([[0] * l + [1] * (max_length - l) for l in lengths.tolist()])
            mask_pad = mask_pad.to(input.device).transpose(0, 1).contiguous()

Now is:


        orig_input = input
        if isinstance(orig_input, PackedSequence):
            input, batch_sizes, sorted_indices, unsorted_indices = input
            length = input.size(0)
            batch_size = input.size(1)
            mask_pad = torch.arange(batch_size,
                                    device=batch_sizes.device).expand(length, batch_size)
            mask_pad = (mask_pad >= batch_sizes.view(length, 1)).contiguous()

bug

opened by bratao 8

Increasing GPU Usage each epoch

I'm trying to implement a model that includes a SRUCell. This are my specs:

Tesla M60 GPU torch.version: 0.4.1.post2 torch.cuda.version: 9.0.176

Although its training, every epoch the memory usage in the GPU increases until it fills it. I made a toy example where this error occurs:

import torch
from torch.autograd import Variable
from sru import SRUCell


batch_size = 5
seq_len = 60
epochs = 1000
cuda = torch.cuda.is_available()

model = SRUCell(100, 100)

if cuda:
    model.cuda(0)

optimizer = torch.optim.Adam([
        {'params':model.parameters()}], lr=1e-3)

loss_function = torch.nn.MSELoss()
    
seq = Variable(torch.rand(batch_size,seq_len,100))
y = Variable(torch.rand(batch_size,100))


if cuda:
    seq = seq.cuda(0)
    y = y.cuda(0)


model.train()

for e in range(epochs):
    model.zero_grad()
    
    h = Variable(torch.zeros(batch_size, 100))
    c = Variable(torch.zeros(batch_size, 100))
    
    if cuda:
        h = h.cuda(0)
        c = c.cuda(0)
    
    for i in range(seq_len):
        x = seq[:,i,:]
        h, c = model(x, c)
    loss = loss_function(h, y)
    loss.backward()
    optimizer.step()
    print('Epoch: {} - Loss: {}'.format(e, loss))

opened by santiag0m 8

Can i put hidden states in sru cell forward like in vanilla pytorch?

In vanilla it work like this

rnn = nn.LSTMCell(10, 20)
input = torch.randn(6, 3, 10)
hx = torch.randn(3, 20)
cx = torch.randn(3, 20)
output = []
for i in range(6):
    hx, cx = rnn(input[i], (hx, cx))
    output.append(hx)

How can i do same for sru cell?

opened by hadaev8 7

AttributeError when preprocessing data for DrQA
Firstly i ran download.sh, and it succesfully downloaded glove and train/dev jsons for SQuAD. However, python prepro.py gave me this:

Traceback (most recent call last): File "prepro.py", line 243, in <module> vocab_tag = list(nlp.tagger.tag_names) AttributeError: 'Tagger' object has no attribute 'tag_names'

My Spacy version is 2.0.3, and it seems like something broke in update from 1.x that is written in requirements, and I didn't succeed in fixing it myself. Any suggests?
opened by mojesty 7

Calculating Backwards For SRU Results in CUDA error.

I'm not sure how, but I'm seeing this error when I try to compute the backwards function. Don't know if you've come across this during your debug?

Traceback (most recent call last):
  File "gan_language.py", line 341, in <module>
    G.backward(one)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/home/nick/wgan-gp/sru/cuda_functional.py", line 417, in backward
    stream=SRU_STREAM
  File "cupy/cuda/function.pyx", line 129, in cupy.cuda.function.Function.__call__ (cupy/cuda/function.cpp:4010)  File "cupy/cuda/function.pyx", line 111, in cupy.cuda.function._launch (cupy/cuda/function.cpp:3647)
  File "cupy/cuda/driver.pyx", line 127, in cupy.cuda.driver.launchKernel (cupy/cuda/driver.cpp:2541)
  File "cupy/cuda/driver.pyx", line 62, in cupy.cuda.driver.check_status (cupy/cuda/driver.cpp:1446)
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle

opened by NickShahML 7

Speed up data loading / batching for ONE BILLION WORD experiment

The data loading was inefficient and was found to be the bottleneck of BILLION WORD training. This PR rewrote the sharding (which data goes to a certain GPU / training process), and improved the training speed significantly.

The figure compares a previous run and a new test run. We see 40% reduction on training time.

This means our reported training efficiency will be much stronger from 59 GPU days to 36 GPU days, and 4x more efficient than FairSeq Transformer results.

opened by taoleicn 6
Different input dimention compared to output dimension

Hi, I'm trying to implement a naive version of this paper in Keras, and was wondering how is the case that - n_in != n_out handled.

I went through the code a few times, and couldn't understand the element wise multiplication of (1 - r_t) with x_t, if x_t is of a different shape than r_t.
question

opened by titu1994 6
support GPU inference in torchscript model for v2.5 / v2.6
This PR works for master branch, v2.5 and v2.6 release

A non-trivial PR to support GPU inference in torchscript

Load CUDA kernels as non-python modules; this is needed for torchscript compilation

Refactored CUDA APIs as functions that return output as tensors, instead of procedures that modify some passed-in tensors.

Added a workaround in case TS tries to locate and compile CUDA methods on machines that don't have CUDA / GPUs

The refactored code has passed the forward() & backward() test.

I also checked the outputs are the same for the non-torchscript and torchscript versions of the same model.
opened by taoleicn 5
Mixed Precision Training

Hi,

first of all I want to thank you for your great work. I'm using SRUs for speech enhancement, they do very well on a reasonable computational cost.

I would like to know if there is a possibility to train SRUs in mixed precision mode? I tried to enable it, by setting precision=16 in the pytorch lightning trainer, but that didn't do the trick.

Kind of regards, Zadagu

opened by Zadagu 1
Any documentation on using SRU++ ?

Hello, I've read and really appreciated your team's wonderful works on SRU++. I want to implement this architecture in other tasks, but i'm having problem finding the documentation on SRU++, as how I can use SRU++ the same way as SRU (calling directly from sru library after installing by pip install sru). I have looked into the dev-3.0.0 branch, which seems like the latest updated branch, but I still have no clues how to call and integrate sru++ modules into my custom defined pytorch modules. Could you help me ?

opened by thangld201 1
FAILED: sru_cuda_kernel.cuda.o

when i run example, i meet this issue:FAILED: sru_cuda_kernel.cuda.o ,and in the end, it report ninja: build stopped: subcommand failed. what should i do to slove this problem?

opened by xianyu-123 0
Avoid unintended eager cuda initialization

We noticed the package initialization for sru is eagerly triggering the initialization because of the following stack of module imports sru.modules -> sru.ops -> cuda_functional and this last module is executing the function load of torch.utils.cpp_extension.

This was detected because of issues caused when running with the server framework in SUBPROCESS_MODE, that is forking a new process for it to run the model. We got an error complaining that CUDA had been already initialized in the parent process, which was not necessary because it is not meant to run the inference in the model.

This PR changes this loading to be more lazy, more concretely we changed the code in sru.modules to avoid the eager import of sru.ops and instead postpone it to the instantiation of a first SRUCell.

The changes in this PR have been tested doing a checkout of this branch in an AWS instance with GPU and running pytest -sv test which resulted in 141 passed, 161 warnings and no failures. So we understand this is working as expected for both CPU and GPU settings.

opened by dkasapp 0
Unknown builtin op: sru_cuda::sru_bi_forward_simple

When using a bidirectional SRU, regular usage seems to be fine, and compilation to torchscript proceeds without error, but upon trying to infer with the compiled torchscript I get:

Unknown builtin op: sru_cuda::sru_bi_forward_simple.

Using pytorch 1.10, sru 2.6.0, cuda 11.3

opened by ctlaltdefeat 2

Releases(v2.7.0-rc1)

v2.7.0-rc1(Jan 4, 2022)

Postponed the CUDA initialization to the instantiation of SRUCells in order to ensure that the process where it takes place is where we plan to run our model.
Source code(tar.gz)
Source code(zip)
3.0.0.dev6(Jun 17, 2021)

More layer norm options; More info in _repr_()
Source code(tar.gz)
Source code(zip)
v2.6.0(May 18, 2021)
Support GPU/CUDA inference in Torchscript model

Support post layer norm

Support custom init value for weight_c

Add unit tests for GPU inference

Add unit tests for backward()

Add more unit tests for Torchscript

Source code(tar.gz)
Source code(zip)
v2.6.0.dev2(May 13, 2021)
Dev1:

Support GPU/CUDA inference in torchscript model

Support post layer norm

Support custom init value for weight_c Dev2:

Fix an issue

Source code(tar.gz)
Source code(zip)
2.6.0.dev3(May 13, 2021)
Support GPU/CUDA inference in torchscript model

Support post layer norm

Support custom init value for weight_c

Source code(tar.gz)
Source code(zip)
v2.6.0.dev(May 12, 2021)
Support GPU/CUDA inference in torchscript model

Support post layer norm

Support custom init value for weight_c

Source code(tar.gz)
Source code(zip)
v3.0.0.dev3(May 4, 2021)

Fix a typo. Add an option to only use attention_last_n_layers. Replace option normalize_after with normalization_type
Source code(tar.gz)
Source code(zip)
v3.0.0.dev2(Mar 18, 2021)
Changes:

change weight_c_init from Optional[float] = None to float = 1.0

Bug fixes:

fix a potential memory leak in custom op

fix bug in cuda maskpad

torchscript compatible in torch 1.5.1 now

Source code(tar.gz)
Source code(zip)
v3.0.0.dev1(Mar 5, 2021)
Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.

Key features / changes:

#160: SRU++ is now available. Unit tests are included for torchscript compatibility and correctness. Example language model training code is available.

#166: fp16 training improvement. The recurrence kernel will run in float16 now when amp is enabled. This gives an additional ~10% speedup on tested language model training, ~20% reduction on GPU memory usage and no regression on final results.

#167: Code clean-up. No autocast block needed in sru.ops.elementwise_recurrence_gpu. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)

Other changes:

Fix an dtype error within adaptive embedding (#168)

Significant speed-up on BILLONWORD training (#169)

LICENCE update requested by IPC (#165)

Source code(tar.gz)
Source code(zip)
v3.0.0.dev0(Jan 22, 2021)
Note that future release and dev releases of v3 might be backwards incompatible with this dev release.

This dev release:

custom_m renamed to transform_module

transform_module always used now (the weight and weight_proj parameters have been removed)

projection_size can take in a sequence of projection_sizes, one per layer

n_proj in SRUCell renamed to projection_size, for consistency

Source code(tar.gz)
Source code(zip)
2.6.0-rc1(Dec 17, 2020)

Source code(tar.gz)
Source code(zip)
2.5.1(Oct 12, 2020)

Fix a torchscript error when layer_norm is used. Added unit tests to cover this use case. Added unit tests to test backward compatibility of loading checkpoints of older version.
Source code(tar.gz)
Source code(zip)
2.5.0(Oct 6, 2020)

Source code(tar.gz)
Source code(zip)
2.4.3(Sep 25, 2020)

https://github.com/asappresearch/sru/pull/132
Source code(tar.gz)
Source code(zip)
2.4.2b(Sep 22, 2020)

Fix a bug introduced after version 2.4 that handles PackedSequence incorrectly (#128).
Source code(tar.gz)
Source code(zip)
2.4.1(Sep 14, 2020)

Fix an issue of missing .cpp source file in the installed package. Fix the .circleci tests to capture such issue in the future.
Source code(tar.gz)
Source code(zip)
2.4.0(Sep 12, 2020)

Source code(tar.gz)
Source code(zip)
2.3.5(May 19, 2020)

Source code(tar.gz)
Source code(zip)
2.3.4(May 19, 2020)

Source code(tar.gz)
Source code(zip)
2.3.3(Mar 25, 2020)

Source code(tar.gz)
Source code(zip)
2.3.2(Feb 23, 2020)

Source code(tar.gz)
Source code(zip)
2.3.1(Feb 12, 2020)

Source code(tar.gz)
Source code(zip)
2.2.1(Feb 1, 2020)

sru_module.make_backward_compatible()
Source code(tar.gz)
Source code(zip)
2.2.0(Jan 31, 2020)

Torch 1.3 compatibility; custom modules in SRU
Source code(tar.gz)
Source code(zip)
2.1.9(Oct 23, 2019)

Source code(tar.gz)
Source code(zip)
2.1.8(Oct 23, 2019)

Source code(tar.gz)
Source code(zip)
2.1.7(Oct 16, 2019)

Source code(tar.gz)
Source code(zip)
2.1.6(Aug 28, 2019)
add '-O3' cflag for faster inference on CPU

a few other minor improvements for speed

Source code(tar.gz)
Source code(zip)
2.1.2(Oct 18, 2018)

CPU optimized code for inference

This release adds a C++ implementation of SRU forward() at inference time (when is_grad_enabled() == False).

The implementation is built following the PyTorch tutorial, using Ninja and torch.utils.cpp_extension. https://pytorch.org/tutorials/advanced/cpp_extension.html

PyTorch 0.4.1 is required.
Source code(tar.gz)
Source code(zip)
2.0.0(Aug 30, 2018)

Initial release to PyPI.
Source code(tar.gz)
Source code(zip)

Owner

ASAPP Research

AI for Enterprise

GitHub

pip install antialiased-cnns to improve stability and accuracy

Antialiased CNNs [Project Page] [Paper] [Talk] Making Convolutional Networks Shift-Invariant Again Richard Zhang. In ICML, 2019. Quick & easy start Ru

1.6k Dec 28, 2022

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

1.5k Jan 3, 2023

PyTorch extensions for fast R&D prototyping and Kaggle farming

Pytorch-toolbelt A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming: What

1.3k Jan 5, 2023

Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

1.1k Jan 7, 2023

An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

2.9k Dec 27, 2022

Fast Discounted Cumulative Sums in PyTorch

TODO: update this README! Fast Discounted Cumulative Sums in PyTorch This repository implements an efficient parallel algorithm for the computation of

7 Feb 17, 2022

High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 6, 2022

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

Accelerate PyTorch models with ONNX Runtime

270 Dec 24, 2022

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

251 Dec 25, 2022

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Deep GNN, Shallow Sampling Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, R

117 Dec 20, 2022

Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

2.1k Jan 1, 2023

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

76 Dec 22, 2022

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow

201 Dec 21, 2022

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory >= 8G Numpy > 1.

46 Dec 14, 2022

https://arxiv.org/abs/2102.11005

LogME LogME: Practical Assessment of Pre-trained Models for Transfer Learning How to use Just feed the features f and labels y to the function, and yo

149 Dec 19, 2022

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Meta-Solver for Neural Ordinary Differential Equations Towards robust neural ODEs using parametrized solvers. Main idea Each Runge-Kutta (RK) solver w

25 Aug 12, 2021

https://arxiv.org/abs/1904.01941

Character-Region-Awareness-for-Text-Detection- https://arxiv.org/abs/1904.01941 Train You can train SynthText data use python source/train_SynthText.p

120 Dec 28, 2022

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

16 Oct 14, 2022

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

153 Dec 14, 2022

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

967 Jan 4, 2023

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Related tags

Overview

News

About

Reference:

Requirements

Installation

From source:

From PyPi:

Directly use the source without installation:

Examples

Contributing

Other Implementations

Comments

Releases(v2.7.0-rc1)

v2.7.0-rc1(Jan 4, 2022)

3.0.0.dev6(Jun 17, 2021)

v2.6.0(May 18, 2021)

v2.6.0.dev2(May 13, 2021)

2.6.0.dev3(May 13, 2021)

v2.6.0.dev(May 12, 2021)

v3.0.0.dev3(May 4, 2021)

v3.0.0.dev2(Mar 18, 2021)

v3.0.0.dev1(Mar 5, 2021)

Key features / changes:

Other changes:

v3.0.0.dev0(Jan 22, 2021)

2.6.0-rc1(Dec 17, 2020)

2.5.1(Oct 12, 2020)

2.5.0(Oct 6, 2020)

2.4.3(Sep 25, 2020)

2.4.2b(Sep 22, 2020)

2.4.1(Sep 14, 2020)

2.4.0(Sep 12, 2020)

2.3.5(May 19, 2020)

2.3.4(May 19, 2020)

2.3.3(Mar 25, 2020)

2.3.2(Feb 23, 2020)

2.3.1(Feb 12, 2020)

2.2.1(Feb 1, 2020)

2.2.0(Jan 31, 2020)

2.1.9(Oct 23, 2019)

2.1.8(Oct 23, 2019)

2.1.7(Oct 16, 2019)

2.1.6(Aug 28, 2019)

2.1.2(Oct 18, 2018)

CPU optimized code for inference

2.0.0(Aug 30, 2018)

Owner

ASAPP Research

pip install antialiased-cnns to improve stability and accuracy

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

PyTorch extensions for fast R&D prototyping and Kaggle farming

Fast, general, and tested differentiable structured prediction in PyTorch

An optimizer that trains as fast as Adam and as good as SGD.

Fast Discounted Cumulative Sums in PyTorch

High-level batteries-included neural network training library for Pytorch

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Training RNNs as Fast as CNNs

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

https://arxiv.org/abs/2102.11005

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

https://arxiv.org/abs/1904.01941

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699