Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Overview

Stochastic Deep Learning for Pytorch

Documentation Status

Documentation on Read the Docs. Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning [1]. Many state of the art deep learning models use gradient estimation, in particular within the fields of Variational Inference and Reinforcement Learning. While PyTorch computes gradients of deterministic computation graphs automatically, it will not estimate gradients on stochastic computation graphs [2].

With Storchastic, you can easily define any stochastic deep learning model and let it estimate the gradients for you. Storchastic provides a large range of gradient estimation methods that you can plug and play, to figure out which one works best for your problem. Storchastic provides automatic broadcasting of sampled batch dimensions, which increases code readability and allows implementing complex models with ease.

When dealing with continuous random variables and differentiable functions, the popular reparameterization method [3] is usually very effective. However, this method is not applicable when dealing with discrete random variables or non-differentiable functions. This is why Storchastic has a focus on gradient estimators for discrete random variables, non-differentiable functions and sequence models.

Documentation on Read the Docs.

Example: Discrete Variational Auto-Encoder

Installation

pip install storchastic

Requires Pytorch 1.5 (older versions will not do!) and Pyro. The code is build on Python 3.7. The master branch works with PyTorch 1.7, but the version on pip is not compatible. Binaries will be updated soon.

Algorithms

Feel free to create an issue if an estimator is missing here.

  • Reparameterization [1, 3]
  • Score Function (REINFORCE) with Moving Average baseline [1, 4]
  • Score Function with Batch Average Baseline [5, 6]
  • Expected value for enumerable distributions
  • (Straight through) Gumbel Softmax [7, 8]
  • LAX, RELAX [9]
  • REBAR [10]
  • REINFORCE Without Replacement [6]
  • Unordered Set Estimator [13]

In development

  • Memory Augmented Policy Optimization [11]
  • Rao-Blackwellized REINFORCE [12]

Planned

  • Measure valued derivatives [1, 14]
  • ARM [15]
  • Automatic Credit Assignment [16]
  • ...

References

Comments
  • ValueError: The value argument to log_prob must be a Tensor

    ValueError: The value argument to log_prob must be a Tensor

    I use python=3.8 and pytorch=1.8.1 and test the examplebernoulli_toy.py Error happens in the following line in expect.py log_probs = tensor.distribution.log_prob(tensor) error is ValueError: The value argument to log_prob must be a Tensor

    Seems .log_prob() wants a tensor object as input. When I print the 'tensor', it outputs:

    x: Stochastic tensor([[0., 0., 0., 0.],
            [0., 0., 0., 1.],
            [0., 0., 1., 0.],
            [0., 0., 1., 1.],
            [0., 1., 0., 0.],
            [0., 1., 0., 1.],
            [0., 1., 1., 0.],
            [0., 1., 1., 1.],
            [1., 0., 0., 0.],
            [1., 0., 0., 1.],
            [1., 0., 1., 0.],
            [1., 0., 1., 1.],
            [1., 1., 0., 0.],
            [1., 1., 0., 1.],
            [1., 1., 1., 0.],
            [1., 1., 1., 1.]]) Batch links: [('x', 16, tensor(0.0625))]
    

    Could you offer some solutions? Thank you

    opened by zhaoguangyuan123 8
  • Recursion limit reached with many cost nodes

    Recursion limit reached with many cost nodes

    Running backward() when there are a lot of cost nodes causes a recursion-limit-reached exception to be thrown.

    I'm able to consistently generate this exception running the following:

    import storch
    import torch
    from torch.distributions import Bernoulli
    from storch.method import GumbelSoftmax
    
    torch.manual_seed(0)
    
    p = torch.tensor(0.5, requires_grad=True)
    
    for i in range(1000):
        sample = GumbelSoftmax(f"sample_{i}")(Bernoulli(p))
        storch.add_cost(sample, f"cost_{i}")
    
    storch.backward()
    

    which produces the following trace:

    Traceback (most recent call last):
      File "...", line 15, in <module>
        storch.backward()
      File ".../lib/python3.8/site-packages/storch/inference.py", line 217, in backward
        accum_loss._clean()
      File ".../lib/python3.8/site-packages/storch/tensor.py", line 401, in _clean
        node._clean()
      File ".../lib/python3.8/site-packages/storch/tensor.py", line 401, in _clean
        node._clean()
      File ".../lib/python3.8/site-packages/storch/tensor.py", line 401, in _clean
        node._clean()
      [Previous line repeated 994 more times]
      File ".../lib/python3.8/site-packages/storch/tensor.py", line 399, in _clean
        node._clean()
    RecursionError: maximum recursion depth exceeded
    
    bug 
    opened by csmith49 4
  • Compatibility with Conv networks

    Compatibility with Conv networks

    I'm currently working with CNNs, which allow input x with shape x.shape -> [N, C, H, W].

    However, when I set N as the batch dim (use storch.denote_independent(x, 0, "data")) and perform RELAX("z", n_samples=1) sampling (it adds another plate), the sampled tensor becomes z.shape -> [Z, N, C, H, W] and is not compatible with Conv2D any more.

    Is there any way to reshape z to be compatible with Conv2D?

    Minimal reproduction code:

    class RelaxSample(nn.Module):
        def __init__(self, k: int, c: int):
            super().__init__()
            self._k = k
            self._wv = nn.Parameter(torch.empty(k, c))
            self._method = RELAX("z", n_samples=1, in_dim=[k], rebar=True)
    
        def forward(self, latent):
            # [n, c, h, w]
            n, c, h, w = latent.shape
            # [n, h, w, c]
            q = latent.permute(0, 2, 3, 1)
            # [n, h, w, c], [k, c] -> [n, h, w, k]
            logit = torch.einsum("nhwc,kc->nhwk", q, self._wv)
            varPosterior = Categorical(logits=logit)
            # [z, n, h, w, k]
            z = self._method(varPosterior)
            # [z, n, c, h, w]
            z = torch.einsum("znhwk,kc->znhwc", z, self._wv).permute(0, 1, 4, 2, 3)
            # ********* z is not compatible with Conv2D *********
            return z
    
    opened by xiaosu-zhu 3
  • RELAX/REBAR missing mc_sample

    RELAX/REBAR missing mc_sample

    RELAX and REBAR gradient estimators don't appear to be usable on Bernoulli random variables - some operation expects them to have the mc_sample attribute, which is missing.

    I can reproduce this result with the following:

    import storch
    import torch
    from torch.distributions import Bernoulli
    from storch.method import RELAX
    
    torch.manual_seed(0)
    
    p = torch.tensor(0.5, requires_grad=True)
    d = Bernoulli(p)
    sample = RELAX("sample")(d)
    storch.add_cost(sample, "cost")
    storch.backward()
    

    which produces the following trace:

    Traceback (most recent call last):
      File "...", line 13, in <module>
        sample = RELAX("sample")(d)
      File ".../lib/python3.8/site-packages/storch/method/relax.py", line 210, in __init__
        super().__init__(plate_name, sampling_method.set_mc_sample(self.mc_sample))
      File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 771, in __getattr__
        raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
    torch.nn.modules.module.ModuleAttributeError: 'RELAX' object has no attribute 'mc_sample'
    
    opened by csmith49 2
  • Plating to denote conditional independencies

    Plating to denote conditional independencies

    Often, samples from a distribution are conditionally independent: for example in LDA. While it is possible to create a list of K conditionally independent variables by looping over sample steps, this is not vectorized and thus not efficient.

    Plating could possibly be done by calling storch.plate() on a storch tensor.

    enhancement 
    opened by HEmile 2
  • Examples with Bernoulli distributions failing

    Examples with Bernoulli distributions failing

    Issue

    Running either of the examples with Bernoulli distributions (examples/bernoulli_toy.py and examples/bernoulli_grad_var.py) results in a ValueError, specifically:

    ValueError: Input arguments must all be instances of numbers.Number or torch.tensor.
    

    Environment

    Running Python 3.8.5, with Pyro 1.5.0 and Torch 1.7.0.

    Traceback

    Running python3 examples/bernoulli_toy.py from the root of the repo produces (with better_exceptions enabled):

    Traceback (most recent call last):
      File "examples/bernoulli_toy.py", line 29, in <module>
        experiment(Expect("x"))
        │          └ <class 'storch.method.method.Expect'>
        └ <function experiment at 0x7fbf47b3a0d0>
      File "examples/bernoulli_toy.py", line 19, in experiment
        x = method(b)
            │      └ Bernoulli(probs: torch.Size([4]), logits: torch.Size([4]))
            └ Expect(
      (sampling_method): Enumerate()
    )
      File ".../python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
                 │             │        └ {}
                 │             └ (Bernoulli(probs: torch.Size([4]), logits: torch.Size([4])),)
                 └ Expect(
      (sampling_method): Enumerate()
    )
      File ".../python3.8/site-packages/storch/method/method.py", line 52, in forward
        return self.sample(distr)
               │           └ Bernoulli(probs: torch.Size([4]), logits: torch.Size([4]))
               └ Expect(
      (sampling_method): Enumerate()
    )
      File ".../python3.8/site-packages/storch/method/method.py", line 114, in sample
        batch_weighting = self.sampling_method.plate_weighting(s_tensor, plate)
                          │                                    │         └ ('x', 16, tensor(0.0625))
                          │                                    └ tensor([[0., 0., 0., 0.],
            [0., 0., 0., 1.],
            [0., 0., 1., 0.],
            [0., 0., 1., 1.],
            [0., 1., 0., 0.]...
                          └ Expect(
      (sampling_method): Enumerate()
    )
      File ".../python3.8/site-packages/storch/sampling/expect.py", line 82, in plate_weighting
        log_probs = tensor.distribution.log_prob(tensor)
                    │                            └ tensor([[0., 0., 0., 0.],
            [0., 0., 0., 1.],
            [0., 0., 1., 0.],
            [0., 0., 1., 1.],
            [0., 1., 0., 0.]...
                    └ tensor([[0., 0., 0., 0.],
            [0., 0., 0., 1.],
            [0., 0., 1., 0.],
            [0., 0., 1., 1.],
            [0., 1., 0., 0.]...
      File ".../python3.8/site-packages/torch/distributions/bernoulli.py", line 94, in log_prob
        logits, value = broadcast_all(self.logits, value)
                │       │             │            └ tensor([[0., 0., 0., 0.],
            [0., 0., 0., 1.],
            [0., 0., 1., 0.],
            [0., 0., 1., 1.],
            [0., 1., 0., 0.]...
                │       │             └ Bernoulli(probs: torch.Size([4]), logits: torch.Size([4]))
                │       └ <function broadcast_all at 0x7fbe57163430>
                └ tensor([[0., 0., 0., 0.],
            [0., 0., 0., 1.],
            [0., 0., 1., 0.],
            [0., 0., 1., 1.],
            [0., 1., 0., 0.]...
      File ".../python3.8/site-packages/torch/distributions/utils.py", line 24, in broadcast_all
        raise ValueError('Input arguments must all be instances of numbers.Number or torch.tensor.')
    ValueError: Input arguments must all be instances of numbers.Number or torch.tensor.
    
    
    bug 
    opened by csmith49 1
  • Abstract sampling procedures

    Abstract sampling procedures

    We should abstract sampling procedures so that they are no longer part of a Method's inheritance tree, but rather as a separate object to plug and play. This would make it easier to combine various estimators with various sampling methods, whether biased or not.

    Example applications:

    • https://openreview.net/forum?id=H1Xw62kRZ uses beam search with a biased REINFORCE method. Allowing to use any estimator with this beam search would help modularity and debugging.
    • Usually, normal MC sampling is enough, but eg when decoding text, we have to deal with eos's that complicate the sampling procedure.
    • Sampling without replacement is applicable to any unbiased estimator when correcting for the bias through the weighting. By making sampling modular, we can more easily test combinations of estimators.
    enhancement 
    opened by HEmile 1
  • Add option to Method to ignore estimation

    Add option to Method to ignore estimation

    Methods like reparameterization do not require computing an estimator. We could skip some transposes by manually checking for this using a function in Method

    opened by HEmile 1
  • Ignore deterministic tensor checking during sampling

    Ignore deterministic tensor checking during sampling

    Tensors resulting from sampling can not correspond with its plates. Solution: Create a "sampling" context in which the wrappers are temporarily disabled so that sampling is no longer a problem.

    enhancement 
    opened by HEmile 1
  • Implement Expect

    Implement Expect

    Implementing Expect is not as easy as I hoped. Unfortunately, in discrete VAEs, for example, you pass logits with shape [batch, latents, categories]. enumerate_support() does not recognize that batch is independent, but the different latents are dependent! It only enumerates the support over the categories dimension, but it should also enumerate over the latents dimension. To do this automatically however requires independency assumptions, which probably brings us back to Issue #4.

    opened by HEmile 1
  • Implement the unordered set estimator

    Implement the unordered set estimator

    https://arxiv.org/abs/2002.06043

    The unordered set estimator is mostly a sampling strategy. It probably also provides variance reduction for other estimators than the score function. Is it possible to make sampling modular from the rest of the estimators? In this case, reweighting the costs is probably enough...

    It won't actually work with reparameterization: The reparameterization will not properly weight the different samples. Should rethink this.

    enhancement help wanted 
    opened by HEmile 1
  • Bump certifi from 2020.12.5 to 2022.12.7

    Bump certifi from 2020.12.5 to 2022.12.7

    Bumps certifi from 2020.12.5 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Refactor: Use torchdim instead of Storchastics plating system

    Refactor: Use torchdim instead of Storchastics plating system

    Storchastic uses a rather intricate system for batching over multiple dimensions, but it's rather buggy and hard to work with for end users. Recently PyTorch 1.12 introduced torchdim with first-class dimension objects that should serve the same purpose as plates in Storchastic. We will likely have much cleaner, easier to read, write and debug and faster code by adopting this new standard. See https://colab.research.google.com/drive/1BsVkddtVMX35aZAvo2GyI-wSFPVBCWuA#scrollTo=8511a637

    It implements

    • Implicit batching: Two batch dimensions are joined together, just like in Storchastic.
    • Mixed named tensors: Only batch dimensions need to be named, event dimensions in Storchastic can just be numeric
    enhancement 
    opened by HEmile 0
  • comparison between gumbel softmax, rebar, and relax

    comparison between gumbel softmax, rebar, and relax

    Hi, @HEmile , I test the example in examples/vae/discrete_vae.py, but find the gumbel softmax performs much better than rebar, relax and reinforce (testing loss after 10 epochs: 98 for gumbel, 165 for rebar, 208 for relax, 209 for reinforce). The performance is not that consistent with the results in your guide, where gumbel softmax is slightly worse than reinforce in terms of training loss. Is there anything wrong with the example? I don't change any parameters. Looking forward to your response.

    opened by fnzhan 4
  • Concatenating Reparameterized and UnorderedSetEstimator samples into one tensor

    Concatenating Reparameterized and UnorderedSetEstimator samples into one tensor

    Hi! I need to concatenate samples from these two methods into one tensor, so that I can sample from a continuous distribution using reparameterization, and sample from a discrete distribution using UnorderedSetEstimator. Is this functionality something that can be built in? It seems that this is not possible given the current iteration (see below for example error). Thank you!

    import storch
    import torch
    import torch.distributions as td
    method1 = storch.method.Reparameterization
    method2 = storch.method.UnorderedSetEstimator
    
    method1 = method1(plate_name="1",n_samples=25)
    method2 = method2(plate_name="1",k=25)
    p1 = td.Independent(td.Normal(loc=torch.zeros([1000,2]),scale=torch.ones([1000,2])),0)
    p2 = td.Independent(td.OneHotCategorical(probs=torch.zeros([1000,3]).uniform_()),0)
    
    samp1 = method1(p1)
    samp2 = method2(p2)
    samp1.shape # torch.Size([25, 1000, 2])
    samp2.shape # torch.Size([25, 1000, 3])
    
    storch.cat([samp1,samp2],2)
    ValueError: Received a plate with name 1 that is not also an AncestralPlate.
    
    enhancement 
    opened by DavidKLim 1
  • Implement GO/Generalized Stochastic Backpropagation

    Implement GO/Generalized Stochastic Backpropagation

    See https://hal.archives-ouvertes.fr/hal-02968975/document

    Implement: For multivariate Bernoulli, Sample a single normal sample, then for each dimension, use this sample and flip the corresponding dimension.

    Run the normal sample and the flipped samples. Weight the original sample with 1. The multiplicative is: the sum of the parameters for the normal sample. These are negated if the corresponding dimension is 1. Then for the flipped samples, use -parameter if it flipped to 0, otherwise use just the parameter (ie, not negated).

    To make the zeroth-order correct, use importance sampling for the flipped samples. In the multiplicative estimator, divide again by this importance sampling to ensure correctness.

    This doesn't use a baseline.

    opened by HEmile 1
Owner
Emile van Krieken
PhD AI student, into combining Knowledge Representation with Machine Learning.
Emile van Krieken
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

null 46 Dec 7, 2022
Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

Momin Haider 0 Jan 2, 2022
A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

MADGRAD Optimization Method A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization pip install madgrad Try it out! A best

Meta Research 774 Dec 31, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Katherine Crowson 53 Dec 29, 2022
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Unmesh 7 Jun 26, 2022
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 2, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
Binary Stochastic Neurons in PyTorch

Binary Stochastic Neurons in PyTorch http://r2rt.com/binary-stochastic-neurons-in-tensorflow.html https://github.com/pytorch/examples/tree/master/mnis

Onur Kaplan 54 Nov 21, 2022
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

null 536 Jan 5, 2023
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

Yang Song 757 Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023