Implementing DropPath/StochasticDepth in PyTorch

Overview
%load_ext memory_profiler

Implementing Stochastic Depth/Drop Path In PyTorch

DropPath is available on glasses my computer vision library!

Introduction

Today we are going to implement Stochastic Depth also known as Drop Path in PyTorch! Stochastic Depth introduced by Gao Huang et al is technique to "deactivate" some layers during training.

Let's take a look at a normal ResNet Block that uses residual connections (like almost all models now).If you are not familiar with ResNet, I have an article showing how to implement it.

Basically, the block's output is added to its input: output = block(input) + input. This is called a residual connection

alt

Here we see four ResnNet like blocks, one after the other.

alt

Stochastic Depth/Drop Path will deactivate some of the block's weight

alt

The idea is to reduce the number of layers/block used during training, saving time and make the network generalize better.

Practically, this means setting to zero the output of the block before adding.

Implementation

Let's start by importing our best friend, torch.

import torch
from torch import nn
from torch import Tensor

We can define a 4D tensor (batch x channels x height x width), in our case let's just send 4 images with one pixel each :)

x = torch.ones((4, 1, 1, 1))

We need a tensor of shape batch x 1 x 1 x 1 that will be used to set some of the elements in the batch to zero, using a given prob. Bernoulli to the rescue!

keep_prob: float = .5
mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    
mask
tensor([[[[0.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Btw, this is equivelant to

mask: Tensor = (torch.rand(x.shape[0], 1, 1, 1) > keep_prob).float()
mask
tensor([[[[1.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Before we multiply x by the mask we need to divide x by keep_prob to rescale down the inputs activation during training, see cs231n. So

x_scaled : Tensor = x / keep_prob
x_scaled
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

Finally

output: Tensor = x_scaled * mask
output
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

We can put together in a function

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x_scaled: Tensor = x / keep_prob
    return x_scaled * mask

drop_path(x, keep_prob=0.5)
tensor([[[[0.]]],


        [[[0.]]],


        [[[2.]]],


        [[[0.]]]])

We can also do the operation in place

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x.div_(keep_prob)
    x.mul_(mask)
    return x


drop_path(x, keep_prob=0.5)
tensor([[[[2.]]],


        [[[2.]]],


        [[[0.]]],


        [[[0.]]]])

However, we may want to use x somewhere else, and dividing x or mask by keep_prob is the same thing. Let's arrive at the final implementation

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1, 1, 1))
drop_path(x, keep_prob=0.8)
tensor([[[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]]])

drop_path only works for 2d data, we need to automatically calculate the number of dimensions from the input size to make it work for any data time

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask_shape: Tuple[int] = (x.shape[0],) + (1,) * (x.ndim - 1) 
    # remember tuples have the * operator -> (1,) * 3 = (1,1,1)
    mask: Tensor = x.new_empty(mask_shape).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1))
drop_path(x, keep_prob=0.8)
tensor([[0.],
        [0.],
        [0.],
        [0.]])

Let's create a nice DropPath nn.Module

class DropPath(nn.Module):
    def __init__(self, p: float = 0.5, inplace: bool = False):
        super().__init__()
        self.p = p
        self.inplace = inplace

    def forward(self, x: Tensor) -> Tensor:
        if self.training and self.p > 0:
            x = drop_path(x, self.p, self.inplace)
        return x

    def __repr__(self):
        return f"{self.__class__.__name__}(p={self.p})"

    
DropPath()(torch.ones((4, 1)))
tensor([[2.],
        [0.],
        [0.],
        [0.]])

Usage with Residual Connections

We have our DropPath, cool but how do we use it? We need a classic ResNet block, let's implement our good old friend BottleNeckBlock

from torch import nn


class ConvBnAct(nn.Sequential):
    def __init__(self, in_features: int, out_features: int, kernel_size=1):
        super().__init__(
            nn.Conv2d(in_features, out_features, kernel_size=kernel_size, padding=kernel_size // 2),
            nn.BatchNorm2d(out_features),
            nn.ReLU()
        )
         

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        return x + res
    
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28))).shape
torch.Size([1, 64, 28, 28])

To deactivate the block the operation x + res must be equal to res, so our DropPath has to be applied after the block.

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        self.drop_path = DropPath()
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        x = self.drop_path(x)
        return x + res
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28)))
tensor([[[[1.0009, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          ...,
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0005, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0011, 1.0011,  ..., 1.0011, 1.0011, 1.0247]],

         [[1.0203, 1.0123, 1.0123,  ..., 1.0123, 1.0123, 1.0299],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          ...,
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         ...,

         [[1.0011, 1.0180, 1.0180,  ..., 1.0180, 1.0180, 1.0465],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0130, 1.0170, 1.0170,  ..., 1.0170, 1.0170, 1.0213],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          ...,
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0012, 1.0139, 1.0139,  ..., 1.0139, 1.0139, 1.0065]],

         [[1.0103, 1.0181, 1.0181,  ..., 1.0181, 1.0181, 1.0539],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          ...,
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]]]],
       grad_fn=<AddBackward0>)

Tada πŸŽ‰ ! Now, randomly, our .block will be completely skipped!


You might also like...
Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

680 Final Project: BicycleGAN Haoran Tang Instructions 1. Training To train the network, please run train.py. Change hyper-parameters and folder paths

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.
BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Overview BisQue is a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for up to 5D

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy
Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy Simplex Algorithm is a popular algorithm for linear programmi

This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

Pose-Estimation This repo is a sub-task of the smart hospital bed project which is about implementing the task of pose estimation πŸ˜„ Many thanks to th

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

This is an open source library implementing hyperbox-based machine learning algorithms
This is an open source library implementing hyperbox-based machine learning algorithms

hyperbox-brain is a Python open source toolbox implementing hyperbox-based machine learning algorithms built on top of scikit-learn and is distributed

An essential implementation of BYOL in PyTorch + PyTorch Lightning
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

RealFormer-Pytorch Implementation of RealFormer using pytorch
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Owner
Francesco Saverio Zuppichini
Computer Vision Engineer @ πŸ€— BSc informatics. MSc AI. Artificial Intelligence /Deep Learning Enthusiast & Full Stack developer
Francesco Saverio Zuppichini
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 7, 2023
A python library for implementing a recommender system

python-recsys A python library for implementing a recommender system. Installation Dependencies python-recsys is build on top of Divisi2, with csc-pys

Oscar Celma 1.5k Dec 17, 2022
Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

AnimeGAN - Deep Convolutional Generative Adverserial Network PyTorch implementation of DCGAN introduced in the paper: Unsupervised Representation Lear

Rohit Kukreja 23 Jul 21, 2022
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

TreePartNet This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction". Depende

εˆ˜ε½¦θΆ… 34 Nov 30, 2022
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

null 40 Dec 29, 2022
Implementing yolov4 target detection and tracking based on nao robot

Implementing yolov4 target detection and tracking based on nao robot

null 6 Apr 19, 2022
Implementing DeepMind's Fast Reinforcement Learning paper

Fast Reinforcement Learning This is a repo where I implement the algorithms in the paper, Fast reinforcement learning with generalized policy updates.

Marcus Chiam 6 Nov 28, 2022
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

Haoran Tang 0 Apr 22, 2022