An unofficial styleguide and best practices summary for PyTorch

IgorSusmelj

Last update: Jan 5, 2023

Related tags

Overview

A PyTorch Tools, best practices & Styleguide

This is not an official style guide for PyTorch. This document summarizes best practices from more than a year of experience with deep learning using the PyTorch framework. Note that the learnings we share come mostly from a research and startup perspective.

This is an open project and other collaborators are highly welcomed to edit and improve the document.

You will find three main parts of this doc. First, a quick recap of best practices in Python, followed by some tips and recommendations using PyTorch. Finally, we share some insights and experiences using other frameworks which helped us generally improve our workflow.

Update 20.12.2020

Added a full example training a model on cifar10
Add setup guide for using VS Code and the remote extension

Update 30.4.2019

After so much positive feedback I also added a summary of commonly used building blocks from our projects at Lightly: You will find building blocks for (Self-Attention, Perceptual Loss using VGG, Spectral Normalization, Adaptive Instance Normalization, ...)
Code Snippets for Losses, Layers and other building blocks

We recommend using Python 3.6+

From our experience we recommend using Python 3.6+ because of the following features which became very handy for clean and simple code:

Python Styleguide recap

We try to follow the Google Styleguide for Python. Please refer to the well-documented style guide on python code provided by Google.

We provide here a summary of the most commonly used rules:

Naming Conventions

From 3.16.4

Type	Convention	Example
Packages & Modules	lower_with_under	from prefetch_generator import BackgroundGenerator
Classes	CapWords	class DataLoader
Constants	CAPS_WITH_UNDER	BATCH_SIZE=16
Instances	lower_with_under	dataset = Dataset
Methods & Functions	lower_with_under()	def visualize_tensor()
Variables	lower_with_under	background_color='Blue'

IDEs

Code Editors

In general, we recommend the use of an IDE such as visual studio code or PyCharm. ~~Whereas VS Code provides syntax highlighting and autocompletion in a relatively lightweight editor PyCharm has lots of advanced features for working with remote clusters.~~ VS Code has become very powerful with its fast growing ecosystem of extensions.

Setting up Visual Studio Code with a Remote Machine

Make sure you have the following extensions installed:

Python (linting, autocompletion, syntax highlighting, code formatting)
Remote - SSH (to work with remote machines)

Follow the guide here: https://code.visualstudio.com/docs/remote/remote-overview

Setting up PyCharm to work with a Remote Machine

Login to your remote machine (AWS, Google etc.)
Create a new folder and a new virtual environment
In Pycharm (professional edition) in the project settings setup a remote interpreter
Configure the remote python interpreter (path to venv on AWS, Google etc.)
Configure the mapping of the code from your local machine to the remote machine

If set up properly this allows you to do the following:

Code on your local computer (notebook, desktop) wherever you want (offline, online)
Sync local code with your remote machine
Additional packages will be installed automatically on a remote machine
You don't need any dataset on your local machine
Run the code and debug on the remote machine as if it would be your local machine running the code

Jupyter Notebook vs Python Scripts

In general, we recommend to use jupyter notebooks for initial exploration/ playing around with new models and code. Python scripts should be used as soon as you want to train the model on a bigger dataset where also reproducibility is more important.

Our recommended workflow:

Start with a jupyter notebook
Explore the data and models
Build your classes/ methods inside cells of the notebook
Move your code to python scripts
Train/ deploy on server

Jupyter Notebook	Python Scripts
+ Exploration	+ Running longer jobs without interruption
+ Debugging	+ Easy to track changes with git
- Can become a huge file	- Debugging mostly means rerunning the whole script
- Can be interrupted (don't use for long training)
- Prone to errors and become a mess

Libraries

Commonly used libraries:

Name	Description	Used for
torch	Base Framework for working with neural networks	creating tensors, networks and training them using backprop
torchvision	todo	data preprocessing, augmentation, postprocessing
Pillow (PIL)	Python Imaging Library	Loading images and storing them
Numpy	Package for scientific computing with Python	Data preprocessing & postprocessing
prefetch_generator	Library for background processing	Loading next batch in background during computation
tqdm	Progress bar	Progress during training of each epoch
torchsummary	Keras summary for PyTorch	Displays network, it's parameters and sizes at each layer
tensorboardX	Tensorboard without tensorflow	Logging experiments and showing them in tensorboard

File Organization

Don't put all layers and models into the same file. A best practice is to separate the final networks into a separate file (networks.py) and keep the layers, losses, and ops in respective files (layers.py, losses.py, ops.py). The finished model (composed of one or multiple networks) should be reference in a file with its name (e.g. yolov3.py, DCGAN.py)

The main routine, respective the train and test scripts should only import from the file having the model's name.

Building a Neural Network in PyTorch

We recommend breaking up the network into its smaller reusable pieces. A network is a nn.Module consisting of operations or other nn.Modules as building blocks. Loss functions are also nn.Module and can, therefore, be directly integrated into the network.

A class inheriting from nn.Module must have a forward method implementing the forward pass of the respective layer or operation.

A nn.module can be used on input data using self.net(input). This simply uses the call() method of the object to feed the input through the module.

output = self.net(input)

A Simple Network in PyTorch

Use the following pattern for simple networks with a single input and single output:

class ConvBlock(nn.Module):
    def __init__(self):
        super(ConvBlock, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(...), 
            nn.ReLU(), 
            nn.BatchNorm2d(...)
        )  
    
    def forward(self, x):
        return self.block(x)

class SimpleNetwork(nn.Module):
    def __init__(self, num_resnet_blocks=6):
        super(SimpleNetwork, self).__init__()
        # here we add the individual layers
        layers = [ConvBlock(...)]
        for i in range(num_resnet_blocks):
            layers += [ResBlock(...)]
        self.net = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.net(x)

Note the following:

We reuse simple, recurrent building blocks such as ConvBlock which consists of the same recurrent pattern of (convolution, activation, normalization) and put them into a separate nn.Module
We build up a list of desired layers and finally turn them into a model using nn.Sequential(). We use the * operator before the list object to unwrap it.
In the forward pass we just run the input through the model

A Network with skip connections in PyTorch

class ResnetBlock(nn.Module):
    def __init__(self, dim, padding_type, norm_layer, use_dropout, use_bias):
        super(ResnetBlock, self).__init__()
        self.conv_block = self.build_conv_block(...)

    def build_conv_block(self, ...):
        conv_block = []

        conv_block += [nn.Conv2d(...),
                       norm_layer(...),
                       nn.ReLU()]
        if use_dropout:
            conv_block += [nn.Dropout(...)]
            
        conv_block += [nn.Conv2d(...),
                       norm_layer(...)]

        return nn.Sequential(*conv_block)

    def forward(self, x):
        out = x + self.conv_block(x)
        return out

Here the skip connection of a ResNet block has been implemented directly in the forward pass. PyTorch allows for dynamic operations during the forward pass.

A Network with multiple outputs in PyTorch

For a network requiring multiple outputs, such as building a perceptual loss using a pretrained VGG network we use the following pattern:

class Vgg19(nn.Module):
  def __init__(self, requires_grad=False):
    super(Vgg19, self).__init__()
    vgg_pretrained_features = models.vgg19(pretrained=True).features
    self.slice1 = torch.nn.Sequential()
    self.slice2 = torch.nn.Sequential()
    self.slice3 = torch.nn.Sequential()

    for x in range(7):
        self.slice1.add_module(str(x), vgg_pretrained_features[x])
    for x in range(7, 21):
        self.slice2.add_module(str(x), vgg_pretrained_features[x])
    for x in range(21, 30):
        self.slice3.add_module(str(x), vgg_pretrained_features[x])
    if not requires_grad:
        for param in self.parameters():
            param.requires_grad = False

  def forward(self, x):
    h_relu1 = self.slice1(x)
    h_relu2 = self.slice2(h_relu1)        
    h_relu3 = self.slice3(h_relu2)        
    out = [h_relu1, h_relu2, h_relu3]
    return out

Note here the following:

We use a pretrained model provided by torchvision.
We split up the network into three slices. Each slice consists of layers from the pretrained model.
We freeze the network by setting requires_grad = False
We return a list with the three outputs of our slices

Custom Loss

Even if PyTorch already has a lot of of standard loss function it might be necessary sometimes to create your own loss function. For this, create a separate file losses.py and extend the nn.Module class to create your custom loss function:

class CustomLoss(nn.Module):
    
    def __init__(self):
        super(CustomLoss,self).__init__()
        
    def forward(self,x,y):
        loss = torch.mean((x - y)**2)
        return loss

Recommended code structure for training your model

A full example is provided in the cifar10-example folder of this repository.

Note that we used the following patterns:

We use BackgroundGenerator from prefetch_generator to load next batches in background see this issue for more information
We use tqdm to monitor training progress and show the compute efficiency. This helps us find bottlenecks in our data loading pipeline.

# import statements
import torch
import torch.nn as nn
from torch.utils import data
...

# set flags / seeds
torch.backends.cudnn.benchmark = True
np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
...

# Start with main code
if __name__ == '__main__':
    # argparse for additional flags for experiment
    parser = argparse.ArgumentParser(description="Train a network for ...")
    ...
    opt = parser.parse_args() 
    
    # add code for datasets (we always use train and validation/ test set)
    data_transforms = transforms.Compose([
        transforms.Resize((opt.img_size, opt.img_size)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    
    train_dataset = datasets.ImageFolder(
        root=os.path.join(opt.path_to_data, "train"),
        transform=data_transforms)
    train_data_loader = data.DataLoader(train_dataset, ...)
    
    test_dataset = datasets.ImageFolder(
        root=os.path.join(opt.path_to_data, "test"),
        transform=data_transforms)
    test_data_loader = data.DataLoader(test_dataset ...)
    ...
    
    # instantiate network (which has been imported from *networks.py*)
    net = MyNetwork(...)
    ...
    
    # create losses (criterion in pytorch)
    criterion_L1 = torch.nn.L1Loss()
    ...
    
    # if running on GPU and we want to use cuda move model there
    use_cuda = torch.cuda.is_available()
    if use_cuda:
        net = net.cuda()
        ...
    
    # create optimizers
    optim = torch.optim.Adam(net.parameters(), lr=opt.lr)
    ...
    
    # load checkpoint if needed/ wanted
    start_n_iter = 0
    start_epoch = 0
    if opt.resume:
        ckpt = load_checkpoint(opt.path_to_checkpoint) # custom method for loading last checkpoint
        net.load_state_dict(ckpt['net'])
        start_epoch = ckpt['epoch']
        start_n_iter = ckpt['n_iter']
        optim.load_state_dict(ckpt['optim'])
        print("last checkpoint restored")
        ...
        
    # if we want to run experiment on multiple GPUs we move the models there
    net = torch.nn.DataParallel(net)
    ...
    
    # typically we use tensorboardX to keep track of experiments
    writer = SummaryWriter(...)
    
    # now we start the main loop
    n_iter = start_n_iter
    for epoch in range(start_epoch, opt.epochs):
        # set models to train mode
        net.train()
        ...
        
        # use prefetch_generator and tqdm for iterating through data
        pbar = tqdm(enumerate(BackgroundGenerator(train_data_loader, ...)),
                    total=len(train_data_loader))
        start_time = time.time()
        
        # for loop going through dataset
        for i, data in pbar:
            # data preparation
            img, label = data
            if use_cuda:
                img = img.cuda()
                label = label.cuda()
            ...
            
            # It's very good practice to keep track of preparation time and computation time using tqdm to find any issues in your dataloader
            prepare_time = start_time-time.time()
            
            # forward and backward pass
            optim.zero_grad()
            ...
            loss.backward()
            optim.step()
            ...
            
            # udpate tensorboardX
            writer.add_scalar(..., n_iter)
            ...
            
            # compute computation time and *compute_efficiency*
            process_time = start_time-time.time()-prepare_time
            pbar.set_description("Compute efficiency: {:.2f}, epoch: {}/{}:".format(
                process_time/(process_time+prepare_time), epoch, opt.epochs))
            start_time = time.time()
            
        # maybe do a test pass every x epochs
        if epoch % x == x-1:
            # bring models to evaluation mode
            net.eval()
            ...
            #do some tests
            pbar = tqdm(enumerate(BackgroundGenerator(test_data_loader, ...)),
                    total=len(test_data_loader)) 
            for i, data in pbar:
                ...
                
            # save checkpoint if needed
            ...

Training on Multiple GPUs in PyTorch

There are two distinct patterns in PyTorch to use multiple GPUs for training. From our experience both patterns are valid. The first one results however in nicer and less code. The second one seems to have a slight performance advantage due to less communication between the GPUs. I asked a question in the official PyTorch forum about the two approaches here

Split up the batch input of each network

The most common one is to simply split up the batches of all networks to the individual GPUs.

A model running on 1 GPU with batch size 64 would, therefore, run on 2 GPUs with each a batch size of 32. This can be done automatically by wrapping the model by nn.DataParallel(model).

Pack all networks in a super network and split up input batch

This pattern is less commonly used. A repository implementing this approach is shown here in the pix2pixHD implementation by Nvidia

Do's and Don't's

Avoid Numpy Code in the forward method of a nn.Module

Numpy runs on the CPU and is slower than torch code. Since torch has been developed with being similar to numpy in mind most numpy functions are supported by PyTorch already.

Separate the DataLoader from the main Code

The data loading pipeline should be independent of your main training code. PyTorch uses background workers for loading the data more efficiently and without disturbing the main training process.

Don't log results in every step

Typically we train our models for thousands of steps. Therefore, it is enough to log loss and other results every n'th step to reduce the overhead. Especially, saving intermediary results as images can be costly during training.

Use Command-line Arguments

It's very handy to use command-line arguments to set parameters during code execution (batch size, learning rate, etc). An easy way to keep track of the arguments for an experiment is by just printing the dictionary received from parse_args:

...
# saves arguments to config.txt file
opt = parser.parse_args()
with open("config.txt", "w") as f:
    f.write(opt.__str__())
...

Use .detach() to free tensors from the graph if possible

PyTorch keeps track of of all operations involving tensors for automatic differentiation. Use .detach() to prevent recording of unnecessary operations.

Use .item() for printing scalar tensors

You can print variables directly, however it's recommended to use variable.detach() or variable.item(). In earlier PyTorch versions < 0.4 you have to use .data to access the tensor of a variable.

Use the call method instead of forward on a nn.Module

The two ways are not identical as pointed out in one of the issues here:

output = self.net.forward(input)
# they are not equal!
output = self.net(input)

FAQ

How to keep my experiments reproducible?

We recommend setting the following seeds at the beginning of your code:

np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)

How to improve training and inference speed further?

On Nvidia GPUs you can add the following line at the beginning of our code. This will allow the cuda backend to optimize your graph during its first execution. However, be aware that if you change the network input/output tensor size the graph will be optimized each time a change occurs. This can lead to very slow runtime and out of memory errors. Only set this flag if your input and output have always the same shape. Usually, this results in an improvement of about 20%.

torch.backends.cudnn.benchmark = True

What is a good value for compute efficiency using your tqdm + prefetch_generator pattern?

It depends on the machine used, the preprocessing pipeline and the network size. Running on a SSD on a 1080Ti GPU we see a compute efficiency of almost 1.0 which is an ideal scenario. If shallow (small) networks or a slow harddisk is used the number may drop to around 0.1-0.2 depending on your setup.

How can I have a batch size > 1 even though I don't have enough memory?

In PyTorch we can implement very easily virtual batch sizes. We just prevent the optimizer from making an update of the parameters and sum up the gradients for batch_size cycles.

...
# in the main loop
out = net(input)
loss = criterion(out, label)
# we just call backward to sum up gradients but don't perform step here
loss.backward() 
total_loss += loss.item() / batch_size
if n_iter % batch_size == batch_size-1:
    # here we perform out optimization step using a virtual batch size
    optim.step()
    optim.zero_grad()
    print('Total loss: ', total_loss)
    total_loss = 0.0
...

How can I adjust the learning rate during training?

We can access the learning rate directly using the instantiated optimizer as shown here:

...
for param_group in optim.param_groups:
    old_lr = param_group['lr']
    new_lr = old_lr * 0.1
    param_group['lr'] = new_lr
    print('Updated lr from {} to {}'.format(old_lr, new_lr))
...

How to use a pretrained model as a loss (non backprop) during training

If you want to use a pretrained model such as VGG to compute a loss but not train it (e.g. Perceptual loss in style-transfer/ GANs/ Auto-encoder) you can use the following pattern:

...
# instantiate the model
pretrained_VGG = VGG19(...)

# disable gradients (prevent training)
for p in pretrained_VGG.parameters():  # reset requires_grad
    p.requires_grad = False
...
# you don't have to use the no_grad() namespace but can just run the model
# no gradients will be computed for the VGG model
out_real = pretrained_VGG(input_a)
out_fake = pretrained_VGG(input_b)
loss = any_criterion(out_real, out_fake)
...

Why do we use .train() and .eval() in PyTorch?

Those methods are used to set layers such as BatchNorm2d or Dropout2d from training to inference mode. Every module which inherits from nn.Module has an attribute called isTraining. .eval() and .train() just simply sets this attribute to True/ False. For more information of how this method is implemented please have a look at the module code in PyTorch

My model uses lots of memory during Inference/ How to run a model properly for inference in PyTorch?

Make sure that no gradients get computed and stored during your code execution. You can simply use the following pattern to assure that:

with torch.no_grad():
    # run model here
    out_tensor = net(in_tensor)

How to fine-tune a pretrained model?

In PyTorch you can freeze layers. This will prevent them from being updated during an optimization step.

# you can freeze whole modules using
for p in pretrained_VGG.parameters():  # reset requires_grad
    p.requires_grad = False

When to use Variable(...)?

Since PyTorch 0.4 *Variable and Tensor have been merged. We don't have to explicitly create a Variable object anymore.

Is PyTorch on C++ faster then using Python?

C++ version is about 10% faster

Can TorchScript / JIT speed up my code?

Todo...

Is PyTorch code using cudnn.benchmark=True faster?

From our experience you can gain about 20% speed-up. But the first time you run your model it takes quite some time to build the optimized graph. In some cases (loops in forward pass, no fixed input shape, if/else in forward, etc.) this flag might result in out of memory or other errors.

How to use multiple GPUs for training?

Todo...

How does .detach() work in PyTorch?

If frees a tensor from a computation graph. A nice illustration is shown here

You like this repo?

Please give feedback on how we can improve this style guide! You can open an issue or propose changes by creating a pull request.

If you like this repo, don't forget to check out other frameworks from us:

Lightly - A computer vision framework for self-supervised learning

Comments

update popular libraries to modern versions
I've added a few changes to the libraries section to:

include a description for torchvision

updated the torchsummary -> torchinfo (the torchsummary repo has updated to torchinfo

include PyTorch native TensorBoard
opened by mrdbourke 1
Add files via upload
PyTorch最佳实践，怎样才能写出一手风格优美的代码机器之心昨天选自github

机器之心编译

参与：Geek.ai、思源

PyTorch是最优秀的深度学习框架之一，它简单优雅，非常适合入门。本文将介绍PyTorch的最佳实践和代码风格都是怎样的。

虽然这是一个非官方的 PyTorch 指南，但本文总结了一年多使用 PyTorch 框架的经验，尤其是用它开发深度学习相关工作的最优解决方案。请注意，我们分享的经验大多是从研究和实践角度出发的。

这是一个开发的项目，欢迎其它读者改进该文档：https://github.com/IgorSusmelj/pytorch-styleguide。

本文档主要由三个部分构成：首先，本文会简要清点 Python 中的最好装备。接着，本文会介绍一些使用 PyTorch 的技巧和建议。最后，我们分享了一些使用其它框架的见解和经验，这些框架通常帮助我们改进工作流。

清点 Python 装备

建议使用 Python 3.6 以上版本

根据我们的经验，我们推荐使用 Python 3.6 以上的版本，因为它们具有以下特性，这些特性可以使我们很容易写出简洁的代码：

自 Python 3.6 以后支持「typing」模块

自 Python 3.6 以后支持格式化字符串（f string）

Python 风格指南

我们试图遵循 Google 的 Python 编程风格。请参阅 Google 提供的优秀的 python 编码风格指南：

地址：https://github.com/google/styleguide/blob/gh-pages/pyguide.md。

在这里，我们会给出一个最常用命名规范小结：

集成开发环境

一般来说，我们建议使用 visual studio 或 PyCharm 这样的集成开发环境。而 VS Code 在相对轻量级的编辑器中提供语法高亮和自动补全功能，PyCharm 则拥有许多用于处理远程集群任务的高级特性。

Jupyter Notebooks VS Python 脚本

一般来说，我们建议使用 Jupyter Notebook 进行初步的探索，或尝试新的模型和代码。如果你想在更大的数据集上训练该模型，就应该使用 Python 脚本，因为在更大的数据集上，复现性更加重要。

我们推荐你采取下面的工作流程：

在开始的阶段，使用 Jupyter Notebook

对数据和模型进行探索

在 notebook 的单元中构建你的类/方法

将代码移植到 Python 脚本中

在服务器上训练/部署

开发常备库

常用的程序库有：

文件组织

不要将所有的层和模型放在同一个文件中。最好的做法是将最终的网络分离到独立的文件（networks.py）中，并将层、损失函数以及各种操作保存在各自的文件中（layers.py，losses.py，ops.py）。最终得到的模型（由一个或多个网络组成）应该用该模型的名称命名（例如，yolov3.py，DCGAN.py），且引用各个模块。

主程序、单独的训练和测试脚本应该只需要导入带有模型名字的 Python 文件。

PyTorch 开发风格与技巧

我们建议将网络分解为更小的可复用的片段。一个 nn.Module 网络包含各种操作或其它构建模块。损失函数也是包含在 nn.Module 内，因此它们可以被直接整合到网络中。

继承 nn.Module 的类必须拥有一个「forward」方法，它实现了各个层或操作的前向传导。

一个 nn.module 可以通过「self.net(input)」处理输入数据。在这里直接使用了对象的「call()」方法将输入数据传递给模块。

output = self.net(input)

PyTorch 环境下的一个简单网络

使用下面的模式可以实现具有单个输入和输出的简单网络：

class ConvBlock(nn.Module): def init(self): super(ConvBlock, self).init() block = [nn.Conv2d(...)] block += [nn.ReLU()] block += [nn.BatchNorm2d(...)] self.block = nn.Sequential(*block)

def forward(self, x): return self.block(x)

class SimpleNetwork(nn.Module): def init(self, num_resnet_blocks=6): super(SimpleNetwork, self).init() # here we add the individual layers layers = [ConvBlock(...)] for i in range(num_resnet_blocks): layers += [ResBlock(...)] self.net = nn.Sequential(*layers)

def forward(self, x): return self.net(x)

请注意以下几点：

我们复用了简单的循环构建模块（如卷积块 ConvBlocks），它们由相同的循环模式（卷积、激活函数、归一化）组成，并装入独立的 nn.Module 中。

我们构建了一个所需要层的列表，并最终使用「nn.Sequential()」将所有层级组合到了一个模型中。我们在 list 对象前使用「*」操作来展开它。

在前向传导过程中，我们直接使用输入数据运行模型。

PyTorch 环境下的简单残差网络

class ResnetBlock(nn.Module): def init(self, dim, padding_type, norm_layer, use_dropout, use_bias): super(ResnetBlock, self).init() self.conv_block = self.build_conv_block(...)

def build_conv_block(self, ...): conv_block = [] conv_block += [nn.Conv2d(...), norm_layer(...), nn.ReLU()] if use_dropout: conv_block += [nn.Dropout(...)] conv_block += [nn.Conv2d(...), norm_layer(...)] return nn.Sequential(*conv_block) def forward(self, x): out = x + self.conv_block(x) return ou

在这里，ResNet 模块的跳跃连接直接在前向传导过程中实现了，PyTorch 允许在前向传导过程中进行动态操作。

PyTorch 环境下的带多个输出的网络

对于有多个输出的网络（例如使用一个预训练好的 VGG 网络构建感知损失），我们使用以下模式:

class Vgg19(torch.nn.Module): def init(self, requires_grad=False): super(Vgg19, self).init() vgg_pretrained_features = models.vgg19(pretrained=True).features self.slice1 = torch.nn.Sequential() self.slice2 = torch.nn.Sequential() self.slice3 = torch.nn.Sequential()

for x in range(7): self.slice1.add_module(str(x), vgg_pretrained_features[x]) for x in range(7, 21): self.slice2.add_module(str(x), vgg_pretrained_features[x]) for x in range(21, 30): self.slice3.add_module(str(x), vgg_pretrained_features[x]) if not requires_grad: for param in self.parameters(): param.requires_grad = False

def forward(self, x): h_relu1 = self.slice1(x) h_relu2 = self.slice2(h_relu1)
h_relu3 = self.slice3(h_relu2)
out = [h_relu1, h_relu2, h_relu3] return out

请注意以下几点：

我们使用由「torchvision」包提供的预训练模型

我们将一个网络切分成三个模块，每个模块由预训练模型中的层组成

我们通过设置「requires_grad = False」来固定网络权重

我们返回一个带有三个模块输出的 list

自定义损失函数

即使 PyTorch 已经具有了大量标准损失函数，你有时也可能需要创建自己的损失函数。为了做到这一点，你需要创建一个独立的「losses.py」文件，并且通过扩展「nn.Module」创建你的自定义损失函数：

class CustomLoss(torch.nn.Module):

def __init__(self): super(CustomLoss,self).__init__() def forward(self,x,y): loss = torch.mean((x - y)**2) return loss

训练模型的最佳代码结构

对于训练的最佳代码结构，我们需要使用以下两种模式：

使用 prefetch_generator 中的 BackgroundGenerator 来加载下一个批量数据

使用 tqdm 监控训练过程，并展示计算效率，这能帮助我们找到数据加载流程中的瓶颈

import statements

import torch import torch.nn as nn from torch.utils import data ...

set flags / seeds

torch.backends.cudnn.benchmark = True np.random.seed(1) torch.manual_seed(1) torch.cuda.manual_seed(1) ...

Start with main code

if name == 'main': # argparse for additional flags for experiment parser = argparse.ArgumentParser(description="Train a network for ...") ... opt = parser.parse_args()

# add code for datasets (we always use train and validation/ test set) data_transforms = transforms.Compose([ transforms.Resize((opt.img_size, opt.img_size)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) train_dataset = datasets.ImageFolder( root=os.path.join(opt.path_to_data, "train"), transform=data_transforms) train_data_loader = data.DataLoader(train_dataset, ...) test_dataset = datasets.ImageFolder( root=os.path.join(opt.path_to_data, "test"), transform=data_transforms) test_data_loader = data.DataLoader(test_dataset ...) ... # instantiate network (which has been imported from *networks.py*) net = MyNetwork(...) ... # create losses (criterion in pytorch) criterion_L1 = torch.nn.L1Loss() ... # if running on GPU and we want to use cuda move model there use_cuda = torch.cuda.is_available() if use_cuda: net = net.cuda() ... # create optimizers optim = torch.optim.Adam(net.parameters(), lr=opt.lr) ... # load checkpoint if needed/ wanted start_n_iter = 0 start_epoch = 0 if opt.resume: ckpt = load_checkpoint(opt.path_to_checkpoint) # custom method for loading last checkpoint net.load_state_dict(ckpt['net']) start_epoch = ckpt['epoch'] start_n_iter = ckpt['n_iter'] optim.load_state_dict(ckpt['optim']) print("last checkpoint restored") ... # if we want to run experiment on multiple GPUs we move the models there net = torch.nn.DataParallel(net) ... # typically we use tensorboardX to keep track of experiments writer = SummaryWriter(...) # now we start the main loop n_iter = start_n_iter for epoch in range(start_epoch, opt.epochs): # set models to train mode net.train() ... # use prefetch_generator and tqdm for iterating through data pbar = tqdm(enumerate(BackgroundGenerator(train_data_loader, ...)), total=len(train_data_loader)) start_time = time.time() # for loop going through dataset for i, data in pbar: # data preparation img, label = data if use_cuda: img = img.cuda() label = label.cuda() ... # It's very good practice to keep track of preparation time and computation time using tqdm to find any issues in your dataloader prepare_time = start_time-time.time() # forward and backward pass optim.zero_grad() ... loss.backward() optim.step() ... # udpate tensorboardX writer.add_scalar(..., n_iter) ... # compute computation time and *compute_efficiency* process_time = start_time-time.time()-prepare_time pbar.set_description("Compute efficiency: {:.2f}, epoch: {}/{}:".format( process_time/(process_time+prepare_time), epoch, opt.epochs)) start_time = time.time() # maybe do a test pass every x epochs if epoch % x == x-1: # bring models to evaluation mode net.eval() ... #do some tests pbar = tqdm(enumerate(BackgroundGenerator(test_data_loader, ...)), total=len(test_data_loader)) for i, data in pbar: ... # save checkpoint if needed ...

PyTorch 的多 GPU 训练

PyTorch 中有两种使用多 GPU 进行训练的模式。

根据我们的经验，这两种模式都是有效的。然而，第一种方法得到的结果更好、需要的代码更少。由于第二种方法中的 GPU 间的通信更少，似乎具有轻微的性能优势。

对每个网络输入的 batch 进行切分

最常见的一种做法是直接将所有网络的输入切分为不同的批量数据，并分配给各个 GPU。

这样一来，在 1 个 GPU 上运行批量大小为 64 的模型，在 2 个 GPU 上运行时，每个 batch 的大小就变成了 32。这个过程可以使用「nn.DataParallel(model)」包装器自动完成。

将所有网络打包到一个超级网络中，并对输入 batch 进行切分

这种模式不太常用。下面的代码仓库向大家展示了 Nvidia 实现的 pix2pixHD，它有这种方法的实现。

地址：https://github.com/NVIDIA/pix2pixHD

PyTorch 中该做和不该做的

在「nn.Module」的「forward」方法中避免使用 Numpy 代码

Numpy 是在 CPU 上运行的，它比 torch 的代码运行得要慢一些。由于 torch 的开发思路与 numpy 相似，所以大多数 Numpy 中的函数已经在 PyTorch 中得到了支持。

将「DataLoader」从主程序的代码中分离

载入数据的工作流程应该独立于你的主训练程序代码。PyTorch 使用「background」进程更加高效地载入数据，而不会干扰到主训练进程。

不要在每一步中都记录结果

通常而言，我们要训练我们的模型好几千步。因此，为了减小计算开销，每隔 n 步对损失和其它的计算结果进行记录就足够了。尤其是，在训练过程中将中间结果保存成图像，这种开销是非常大的。

使用命令行参数

使用命令行参数设置代码执行时使用的参数（batch 的大小、学习率等）非常方便。一个简单的实验参数跟踪方法，即直接把从「parse_args」接收到的字典（dict 数据）打印出来：

saves arguments to config.txt file

opt = parser.parse_args()with open("config.txt", "w") as f: f.write(opt.str())

如果可能的话，请使用「Use .detach()」从计算图中释放张量

为了实现自动微分，PyTorch 会跟踪所有涉及张量的操作。请使用「.detach()」来防止记录不必要的操作。

使用「.item()」打印出标量张量

你可以直接打印变量。然而，我们建议你使用「variable.detach()」或「variable.item()」。在早期版本的 PyTorch（< 0.4）中，你必须使用「.data」访问变量中的张量值。

使用「call」方法代替「nn.Module」中的「forward」方法

这两种方式并不完全相同，正如下面的 GitHub 问题单所指出的：https://github.com/IgorSusmelj/pytorch-styleguide/issues/3

output = self.net.forward(input)

they are not equal!

output = self.net(input)

原文链接：https://github.com/IgorSusmelj/pytorch-styleguide

本文由机器之心编译，转载请联系本公众号获得授权。

✄------------------------------------------------

加入机器之心（全职记者 / 实习生）：[email protected]

投稿或寻求报道：[email protected]

广告 & 商务合作：[email protected]

微信扫一扫关注该公众号
opened by Leo-xxx 1

Use multiple GPUs for training

def multi_gpu(model):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if device != torch.device('cpu') and torch.cuda.device_count() > 1:
        model = nn.DataParallel(model)
        print(f'Using {torch.cuda.device_count()} GPUs!')
    elif device == torch.device('cuda'):
        print(f'Using 1 GPU!')
    else:
        print('Using CPU!')
    model.to(device)
    return model, device

I do by this.

opened by cenjinglun 0

Who is using this styleguide?

The style guide started as a small side project for personal use. We then started using it within our company. Now, I heard from friends that some of the largest tech companies all over the world are using it or extended versions.

I would love to learn which companies are using this style guide (If you're a user of the guide and want to be featured in the readme let me know)

Feel free to let me know if you want something else covered :)

opened by IgorSusmelj 0

Review examples in README

Hi Igor,

First of all, thank you for the repo, super cool and very usefull. Hovewer, I am sorry but I have to say it, all the examples in the readme use bad coding practices. Take this:

class ConvBlock(nn.Module):
    def __init__(self):
        super(ConvBlock, self).__init__()
        block = [nn.Conv2d(...)]
        block += [nn.ReLU()]
        block += [nn.BatchNorm2d(...)]
        self.block = nn.Sequential(*block)
    
    def forward(self, x):
        return self.block(x)

Why create a 1 item array every time and add it to block? It doesn't make sense, it is confusing and useless. Do this instead

class ConvBlock(nn.Module):
    def __init__(self):
        super(ConvBlock, self).__init__()
        self.block = nn.Sequential(
                        nn.Conv2d(...),  
                        nn.ReLU(), 
                        nn.BatchNorm2d(...)
)
    
    def forward(self, x):
        return self.block(x)

Cleaner and faster to code, or even better:

class ConvBlock(nn.Sequential):
    def __init__(self):
        super().__init__( nn.Conv2d(...),  
                        nn.ReLU(), 
                        nn.BatchNorm2d(...))

No need to write the forward method.

Hope it helps and I hope to see better and better code in the future :)

opened by FrancescoSaverioZuppichini 3

Should we use BackgroundGenerator when we've had DataLoader?

I really enjoy this guide! However, I am not sure what the advantage of prefetch_generator is. It seems that DataLoader in pytorch has already supported prefetching.

Thank you!

opened by yzhang1918 6
Some ideas for improvement + Do you need a collaborator?
Hi Team,

I've been thinking about doing something like this and you guys already have a great head start. I'd love to be a collaborator or at least a regular contributor to this project.

Speaking of which, here are some ideas on how the guide can be improved.

IDEs: While I use PyCharm myself and I love it, I know people also use Sublime Text + SFTP plugin or VSCode + Rsync plugin. Those could be added.

Exploration of High-level APIs starting from ignite. These have implications for how to structure to code and usage of callbacks.

Logging and experiment management, a review of which libraries work well with Python for this. I currently started a repo dedicated to Mlflow with certain use cases. This has implications on how to set up the CLI for the main and subsequent procedures.

Project template. There is this one I really like.

How to set up different entry points for people to interact with your research in the repo? I am usually influenced by this repo

And many others I forgot about.
opened by dorukhansergin 1

Owner

IgorSusmelj

Co-founder at Lightly Degree from ETH Zurich with a focus on embedded computing and machine learning.

GitHub

Cross-platform CLI tool to generate your Github profile's stats and summary.

ghs Cross-platform CLI tool to generate your Github profile's stats and summary. Preview Hop on to examples for other usecases. Jump to: Installation

134 Dec 20, 2022

Codebase for the Summary Loop paper at ACL2020

Summary Loop This repository contains the code for ACL2020 paper: The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Training

Canny Lab @ The University of California, Berkeley

44 Nov 4, 2022

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

42 Aug 14, 2022

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract

52 Nov 22, 2022

PyTorch implementation of the Value Iteration Networks (VIN) (NIPS '16 best paper)

Value Iteration Networks in PyTorch Tamar, A., Wu, Y., Thomas, G., Levine, S., and Abbeel, P. Value Iteration Networks. Neural Information Processing

75 Nov 24, 2022

Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

VIN: Value Iteration Networks A quick thank you A few others have released amazing related work which helped inspire and improve my own implementation

297 Dec 26, 2022

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

906 Jan 3, 2023

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

91 Dec 2, 2022

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network Requirements pytorch 1.1+ torchvision 0.3+ pyclipper opencv3 gcc

400 Dec 26, 2022

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

122 Dec 12, 2022

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

33 Dec 2, 2022

An unofficial styleguide and best practices summary for PyTorch

Related tags

Overview

A PyTorch Tools, best practices & Styleguide

We recommend using Python 3.6+

Python Styleguide recap

Naming Conventions

IDEs

Code Editors

Setting up Visual Studio Code with a Remote Machine

Setting up PyCharm to work with a Remote Machine

Jupyter Notebook vs Python Scripts

Libraries

File Organization

Building a Neural Network in PyTorch

A Simple Network in PyTorch

A Network with skip connections in PyTorch

A Network with multiple outputs in PyTorch

Custom Loss

Recommended code structure for training your model

Training on Multiple GPUs in PyTorch

Split up the batch input of each network

Pack all networks in a super network and split up input batch

Do's and Don't's

Avoid Numpy Code in the forward method of a nn.Module

Separate the DataLoader from the main Code

Don't log results in every step

Use Command-line Arguments

Use .detach() to free tensors from the graph if possible

Use .item() for printing scalar tensors

Use the call method instead of forward on a nn.Module

FAQ

You like this repo?

Comments

update popular libraries to modern versions

Add files via upload

import statements

set flags / seeds

Start with main code

saves arguments to config.txt file

they are not equal!

Use multiple GPUs for training

Who is using this styleguide?

Review examples in README

Should we use BackgroundGenerator when we've had DataLoader?

Some ideas for improvement + Do you need a collaborator?

Owner

IgorSusmelj

Cross-platform CLI tool to generate your Github profile's stats and summary.

Codebase for the Summary Loop paper at ACL2020

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

PyTorch implementation of the Value Iteration Networks (VIN) (NIPS '16 best paper)

Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

This is an unofficial PyTorch implementation of Meta Pseudo Labels

An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Unofficial PyTorch code for BasicVSR

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Unofficial Pytorch Implementation of WaveGrad2