This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

Overview

Twitter

labml.ai Deep Learning Paper Implementations

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

The website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better.

Screenshot

We are actively maintaining this repo and adding new implementations almost weekly. Twitter for updates.

Modules

Transformers

Recurrent Highway Networks

LSTM

HyperNetworks - HyperLSTM

ResNet

ConvMixer

Capsule Networks

Generative Adversarial Networks

Diffusion models

Sketch RNN

Graph Neural Networks

Counterfactual Regret Minimization (CFR)

Solving games with incomplete information such as poker with CFR.

Reinforcement Learning

Optimizers

Normalization Layers

Distillation

Adaptive Computation

Uncertainty

Installation

pip install labml-nn

Citing

If you use this for academic research, please cite it using the following BibTeX entry.

@misc{labml,
 author = {Varuna Jayasiri, Nipun Wijerathne},
 title = {labml.ai Annotated Paper Implementations},
 year = {2020},
 url = {https://nn.labml.ai/},
}

Other Projects

🚀 Trending Research Papers

This shows the most popular research papers on social media. It also aggregates links to useful resources like paper explanations videos and discussions.

🧪 labml.ai/labml

This is a library that let's you monitor deep learning model training and hardware usage from your mobile phone. It also comes with a bunch of other tools to help write deep learning code efficiently.

Comments
  • The gradient flow of Switch transformer seems wrong?

    The gradient flow of Switch transformer seems wrong?

    In the Mixture of Expert (MoE) system, the outputs of the experts are the weighted sum of each expert. The contribution ratio of each expert depends on the gate value, and that is how the gate weights are optimized from. In Switch Transformer, specially, only one expert is picked to contribute to the integrated outputs. I expect the gradient signal of the gate comes from multiplying the scaled route probability with the outputs of the picked expert. However, I found the scaled router probability is mutiplied with the input of the experts, as this line shows. I am wondering whether this is wrong. Looking forward to your reply.

    opened by hobbitlzy 7
  • torch version to run a demo code for FNet

    torch version to run a demo code for FNet

    Hi guys! Thanks for the awesome work! I am excited to play with the code. Here is the gist https://gist.github.com/ra312/f3c895aba6e8954985258de10e9be52f At the first attempt, I am encountered this exception

    image

    I believe the right version of pytorch shold help to fix this! These are my current dependencies

    image

    If someone can advise on torch version, it would save me time and be super cool! Many thanks, Rauan.

    enhancement 
    opened by ra312 6
  • Internal Covariate Shift example in BatchNorm

    Internal Covariate Shift example in BatchNorm

    Hello! The example of Internal Covariate Shift in the introduction to Batch Normalization seemed strange to me. It says that

    The paper defines Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

    Although the example is:

    For example, let’s say there are two layers l1 and l2. During the beginning of the training l1 outputs (inputs to l2) could be in distribution N(0.5,1). Then, after some training steps, it could move to N(0.5,1). This is internal covariate shift.

    There is no difference in parameters of the l1 outputs' distribution. Maybe there should be another values of mean and variance?

    bug 
    opened by vklyukin 6
  • Stride setting in ResNet implementation

    Stride setting in ResNet implementation

    The given implementation puts all layers into a list blocks , and the condition to set stride to 2 is len(blocks)==1. So The size of image will only modify in the first block, and the image size is always the same in all other blocks. It does not match the comments of the implementation code and the paper. In addition, why not put the different blocks into nn.Sequential, so that the data can be easily fed into the model and print the output size of each layer. This will not trigger this error.

    question 
    opened by JingxuanKang 4
  • U-Net Number of Up and Down layers

    U-Net Number of Up and Down layers

    Hi, thanks for sharing these models.

    In your implementation of the U-Net model, the number of UpBlock seems to be larger than the number of DownBlock, whereas the original model was introduced as having symmetrical Up and Down layers. Is there a particular reason for this difference ?

    I am referring to the following lines: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/f169f3a71dd2d36eb28ad31062d3475efa367b88/labml_nn/diffusion/ddpm/unet.py#L357-L368

    question 
    opened by maxjcohen 4
  • Issue with autoregressive_experiment.py

    Issue with autoregressive_experiment.py

    Hey

    I was trying out the code here - https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.py and it throws an error for Division by Zero

    Traceback (most recent call last):
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/threading.py", line 926, in _bootstrap_inner
        self.run()
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/api/__init__.py", line 97, in run
        packets = self._get_packets()
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/api/__init__.py", line 92, in _get_packets
        packets = [s.get_data_packet() for s in sources]
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/api/__init__.py", line 92, in <listcomp>
        packets = [s.get_data_packet() for s in sources]
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/tracker/writers/web_api.py", line 71, in get_data_packet
        self.data['track'] = self.get_and_clear_indicators()
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/tracker/writers/web_api.py", line 113, in get_and_clear_indicators
        step = self._mean(step, max_buffer_size)
      File "/home/aflah/anaconda3/envs/pytorch/lib/python3.7/site-packages/labml/internal/tracker/writers/web_api.py", line 85, in _mean       
        blocks = (len(values) + n_elems - 1) // n_elems
    ZeroDivisionError: integer division or modulo by zero
    

    I'm not quite sure how to fix this because this seems to be in some other module file

    bug 
    opened by aflah02 4
  • Own Annotated Implementations

    Own Annotated Implementations

    Hi Guys, Great work thanks for this great tool.

    I wanted to know can I make my own annotations for a code? I see that we have to add docstrings and comments in the code to get the annotations. ex: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/vit/init.py

    But how can I export it to a html file or the website?? Is it possible?

    opened by abdksyed 4
  • Stride Issue in ResNet?

    Stride Issue in ResNet?

    https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b10e3dea2c7e1f6569bfdf8e1a48f8d48b5a645d/labml_nn/resnet/init.py#L286-L288

    Your annotation in the above code says: "The first block for the new feature map size, will have a stride length of 2, except fro the very first block".

    On the contrary, your implementaion means the first convolution of the very first block has a stride of 2, and the first convolution of other blocks have a stride of 1. I think this is an issue.

    bug question 
    opened by zwqnju 4
  • ParityDataset not deterministic

    ParityDataset not deterministic

    First of all, great job on the implementation of PonderNet!!

    I have a quick question about the experimental setup. It seems like your ParityDataset is not deterministic. https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/068225aa16ee9ae97dfdca163fa9055d5b9ab7a3/labml_nn/adaptive_computation/parity.py#L44-L62

    I guess for training it is very convenient, however, this also means that evaluating on this dataset will give random results.

    However, if this was intended, please ignore this issue:)

    question 
    opened by jankrepl 4
  • GATv2 refactoring

    GATv2 refactoring

    Hi @vpj

    • Fixed readme
    • Added some clarification about this GATv2 implementation
    • Updated the dropout in the Core experiment, and added updated links to the experiment.
    • The paper author's tweeter IDs are @shakedbr @urialon1 @yahave

    Can you generate the docs (and the experiment comparison)?

    Thanks!

    Shaked

    opened by shakedbr 4
  • Bug in SA for DDPM UNet?

    Bug in SA for DDPM UNet?

    https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/05632f9f8e0de4657c210a13954a81f9556fd1ed/labml_nn/diffusion/ddpm/unet.py#L188

    According to my understanding of Self-Attention, the softmax operation should be done along the j axis in einsum?

    https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/05632f9f8e0de4657c210a13954a81f9556fd1ed/labml_nn/diffusion/ddpm/unet.py#L190

    So, I think the code should be attn = attn.softmax(dim=2). Please correct me if I am wrong.

    However, the Attention module (with bug?) seems to work somehow, at least on the CIFAR-10 dataset.

    bug 
    opened by FutureXiang 3
  • StyleGAN2: Why don't you multiply path length penalty by the lazy regularization interval?

    StyleGAN2: Why don't you multiply path length penalty by the lazy regularization interval?

    In the paper the StyleGAN2 code based on it is mentioned that when using lazy regularization technique, the regularization terms should be multiplied "by k to balance the overall magnitude of its gradients". Although you do that with the gradient penalty, PLP without being multiplied by anything is added to the generator loss. In other implementations of this paper, both GP and PLP are multiplied (but there is also a separate step for regularizations unlike it is in yours). I have not tested if there are any improvements when this feature but there were definitely some when I decreased the lazy path penalty interval from 32 to 4 (when it was 32 latent vector was ignored as it was mentioned in one of the issues) Is there any reason not to multiply PLP by the interval or is it a bug?

    opened by yanisnotavocado 0
  • Error displaying widget: model not found

    Error displaying widget: model not found

    When I use jupyter lab to run the ipynb file, it always print 'Error displaying widget: model not found' for experiment.configs, configs.init() and with experiment.start(): configs.run(). The sampling figure can be shown but the training procedure is unseen. But if I use terminal to run the py file, the training procedure is avaliable but no images. Any idea to solve this issue? Thanks!

    opened by KKK06 1
  • Request - Object Detection Papers

    Request - Object Detection Papers

    Currently, labmlai has no implementation for Object Detection Papers such as Yolo Family, FPN, Retinanet.

    Do you have any timeline to share them as well?

    paper implementation 
    opened by abhiksark 5
  • [BUG] StyleGAN2: latent vector is ignored

    [BUG] StyleGAN2: latent vector is ignored

    The implementation of StyleGAN2 does not learn a mapping for the latent vector z. The vector z is completely ignored, and a variety of generated images is provided by noise. To demonstrate the issue, I created a google colab with a pre-trained model that I trained for 55400 iterations.

    Images genertd with a random z and a fixed noise: image

    Images generated with a fixed z and random noise image

    bug 
    opened by karray 1
  • Siamese Recurrent Architectures for Learning Sentence Similarity

    Siamese Recurrent Architectures for Learning Sentence Similarity

    I would like to understand the following paper

    Mueller, J. and Thyagarajan, A., 2016, March. "Siamese recurrent architectures for learning sentence similarity". In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).

    available from https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12195/12023

    It would be great if you could annotate it like the ones you have already.

    paper implementation 
    opened by mamonu 0
Owner
labml.ai
Tools to help deep learning researchers
labml.ai
Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Transfer-Learn is an open-source and well-documented library for Transfer Learning. It is based on pure PyTorch with high performance and friendly API. Our code is pythonic, and the design is consistent with torchvision. You can easily develop new algorithms, or readily apply existing algorithms.

THUML @ Tsinghua University 2.2k Jan 3, 2023
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Implemented fully documented Particle Swarm Optimization (PSO) algorithm in Python which includes a basic model along with few advanced features such as updating inertia weight, cognitive, social learning coefficients and maximum velocity of the particle.

null 9 Nov 29, 2022
Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

zhql 98 Nov 16, 2022
PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation The paper: https://arxiv.org/abs/1704.03296 What makes

Jacob Gildenblat 322 Dec 17, 2022
📦 PyTorch based visualization package for generating layer-wise explanations for CNNs.

Explainable CNNs ?? Flexible visualization package for generating layer-wise explanations for CNNs. It is a common notion that a Deep Learning model i

Ashutosh Hathidara 183 Dec 15, 2022
Pytorch Lightning 1.2k Jan 6, 2023
PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

Petros Christodoulou 4.7k Jan 4, 2023
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

Gemini Light 4 Dec 31, 2022
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 270 Dec 31, 2022
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

Mohamed Ayman 33 Dec 2, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

Ilya Kostrikov 546 Dec 5, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
audioLIME: Listenable Explanations Using Source Separation

audioLIME This repository contains the Python package audioLIME, a tool for creating listenable explanations for machine learning models in music info

Institute of Computational Perception 27 Dec 1, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.

UAV-Networks Simulator - Autonomous Networking - A.A. 20/21 UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac pr

null 0 Nov 13, 2021