This package implements THOR: Transformer with Stochastic Experts.

Microsoft

Last update: Nov 22, 2022

Related tags

Deep Learning Stochastic-Mixture-of-Experts

Overview

THOR: Transformer with Stochastic Experts

This PyTorch package implements Taming Sparsely Activated Transformer with Stochastic Experts.

Installation

The most convenient way to run the code is to use this docker image: tartarusz/adv-train:azure-pytorch-apex-v1.7.0. The image supports running on Microsoft Azure.
Our implementation is based on Fairseq.

Instructions

Download Fairseq (v1.0.0+) to the current directory.
Run pip install -e . to install the package locally.
To run a sample translation task on IWSLT'14 De-En, first follow the instructions here to download and tokenize the data, then use bash preprocess.sh to pre-process the tokenized data.
Run bash run.sh to train a THOR model.

Notes

Contact Information

For personal communication related to this package, please contact Simiao Zuo ([email protected]), Xiaodong Liu ([email protected]), or Jian Jiao ([email protected]).

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

210 Jan 4, 2023

Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers)

50 Dec 16, 2022

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Stochastic Image-to-Video Synthesis using cINNs Official PyTorch implementation of Stochastic Image-to-Video Synthesis using cINNs accepted to CVPR202

135 Dec 28, 2022

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

130 Dec 2, 2022

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

46 Dec 7, 2022

On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

46 Dec 15, 2022

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis Andreas Bl

36 Dec 25, 2022

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks (SDPoint) This repository contains the cod

17 Jul 4, 2022

Binary Stochastic Neurons in PyTorch

Binary Stochastic Neurons in PyTorch http://r2rt.com/binary-stochastic-neurons-in-tensorflow.html https://github.com/pytorch/examples/tree/master/mnis

54 Nov 21, 2022

Comments

How to run this on stand-alone GPU cluster

Hi, I am student and trying to run you code. I am unable to run the code after following your instructions. I have installed "fairseq" repo inside the folder of your code-repo. I followed the steps and able to run the preporocess step. When I try to train model "run_iwslt14_de_en.sh", first I got the error : "/Stochastic-Mixture-of-Experts/thor/transformer_thor_layer.py", line 21, in class ThorTransformerEncoderLayer(TransformerEncoderLayer):" It was unable to find "TransformerEncoderLayer" class/method. When I checked the corresponding package "/fairseq/models/transformer/" , there is no such class. Looks like they have renamed it to "TransformerEncoder" . Correspondingly I made the following changes in "transformer_thor_layer.py"

#from fairseq.models.transformer import TransformerDecoderLayer, TransformerEncoderLayer from fairseq.models.transformer.transformer_decoder import TransformerDecoder from fairseq.models.transformer.transformer_encoder import TransformerEncoder

But now I get a new error:

/Stochastic-Mixture-of-Experts/thor/transformer_thor_layer.py", line 40, in init super().init(args) TypeError: init() missing 2 required positional arguments: 'dictionary' and 'embed_tokens

Can you please help with this ? Thank you. Thank you, Amit

opened by amitchandak 1
Model Config

Thanks for your great work! I have some questions about the model config of both the dense model and the sparse model that have a similar size, such as number of layers and hidden size. Could you please provide some details about this?

opened by xwwwwww 0

Owner

Microsoft

Open source projects and samples from Microsoft

GitHub

This package implements THOR: Transformer with Stochastic Experts.

Related tags

Overview

THOR: Transformer with Stochastic Experts

Installation

Instructions

Notes

Contact Information

Contributing

Trademarks

You might also like...

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Stochastic Normalizing Flows

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

On the model-based stochastic value gradient for continuous reinforcement learning

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Binary Stochastic Neurons in PyTorch

Comments

How to run this on stand-alone GPU cluster

Model Config

Owner

Microsoft

Tutel MoE: An Optimized Mixture-of-Experts Implementation

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

The Generic Manipulation Driver Package - Implements a ROS Interface over the robotics toolbox for Python

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Official code for Score-Based Generative Modeling through Stochastic Differential Equations

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"