Diverse Branch Block: Building a Convolution as an Inception-like Unit

Overview

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021)

DBB is a powerful ConvNet building block to replace regular conv. It improves the performance without any extra inference-time costs. This repo contains the code for building DBB and converting it into a single conv. You can also get the equivalent kernel and bias in a differentiable way at any time (get_equivalent_kernel_bias in diversebranchblock.py). This may help training-based pruning or quantization.

This is the PyTorch implementation. The MegEngine version is at https://github.com/megvii-model/DiverseBranchBlock

Paper: https://arxiv.org/abs/2103.13425

Update: released the code for building the block, transformations and verification.

Update: a more efficient implementation of BNAndPadLayer

Sometimes I call it ACNet v2 because 'DBB' is two bits larger than 'ACB' in ASCII. (lol)

We provide the trained models and a super simple PyTorch-official-example-style training script to reproduce the results.

Abstract

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs. The block is named Diverse Branch Block (DBB), which enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multi-scale convolutions, and average pooling. After training, a DBB can be equivalently converted into a single conv layer for deployment. Unlike the advancements of novel ConvNet architectures, DBB complicates the training-time microstructure while maintaining the macro architecture, so that it can be used as a drop-in replacement for regular conv layers of any architecture. In this way, the model can be trained to reach a higher level of performance and then transformed into the original inference-time structure for inference. DBB improves ConvNets on image classification (up to 1.9% higher top-1 accuracy on ImageNet), object detection and semantic segmentation.

image image image

Use our pretrained models

You may download the models reported in the paper from Google Drive (https://drive.google.com/drive/folders/1BPuqY_ktKz8LvHjFK5abD0qy3ESp8v6H?usp=sharing) or Baidu Cloud (https://pan.baidu.com/s/1wPaQnLKyNjF_bEMNRo4z6Q, the access code is "dbbk"). Currently only ResNet-18 models are available. The others will be released very soon. For the ease of transfer learning on other tasks, we provide both training-time and inference-time models. For ResNet-18 as an example, assume IMGNET_PATH is the path to your directory that contains the "train" and "val" directories of ImageNet, you may test the accuracy by running

python test.py IMGNET_PATH train ResNet-18_DBB_7101.pth -a ResNet-18 -t DBB

Here "train" indicates the training-time structure

Convert the training-time models into inference-time

You may convert a trained model into the inference-time structure with

python convert.py [weights file of the training-time model to load] [path to save] -a [architecture name]

For example,

python convert.py ResNet-18_DBB_7101.pth ResNet-18_DBB_7101_deploy.pth -a ResNet-18

Then you may test the inference-time model by

python test.py IMGNET_PATH deploy ResNet-18_DBB_7101_deploy.pth -a ResNet-18 -t DBB

Note that the argument "deploy" builds an inference-time model.

ImageNet training

The multi-processing training script in this repo is based on the official PyTorch example for the simplicity and better readability. The modifications include the model-building part and cosine learning rate scheduler. You may train and test like this:

python train.py -a ResNet-18 -t DBB --dist-url tcp://127.0.0.1:23333 --dist-backend nccl --multiprocessing-distributed --world-size 1 --rank 0 --workers 64 IMGNET_PATH
python test.py IMGNET_PATH train model_best.pth.tar -a ResNet-18

Use like this in your own code

Assume your model is like

class SomeModel(nn.Module):
    def __init__(self, ...):
        ...
        self.some_conv = nn.Conv2d(...)
        self.some_bn = nn.BatchNorm2d(...)
        ...
        
    def forward(self, inputs):
        out = ...
        out = self.some_bn(self.some_conv(out))
        ...

For training, just use DiverseBranchBlock to replace the conv-BN. Then SomeModel will be like

class SomeModel(nn.Module):
    def __init__(self, ...):
        ...
        self.some_dbb = DiverseBranchBlock(..., deploy=False)
        ...
        
    def forward(self, inputs):
        out = ...
        out = self.some_dbb(out)
        ...

Train the model just like you train the other regular models. Then call switch_to_deploy of every DiverseBranchBlock, test, and save.

model = SomeModel(...)
train(model)
for m in train_model.modules():
    if hasattr(m, 'switch_to_deploy'):
        m.switch_to_deploy()
test(model)
save(model)

FAQs

Q: Is the inference-time model's output the same as the training-time model?

A: Yes. You can verify that by

python dbb_verify.py

Q: What is the relationship between DBB and RepVGG?

A: RepVGG is a plain architecture, and the RepVGG-style structural re-param is designed for the plain architecture. On a non-plain architecture, a RepVGG block shows no superiority compared to a single 3x3 conv (it improves Res-50 by only 0.03%, as reported in the RepVGG paper). DBB is a universal building block that can be used on numerous architectures.

Q: How to quantize a model with DBB?

A1: Post-training quantization. After training and conversion, you may quantize the converted model with any post-training quantization method. Then you may insert a BN after the conv converted from a DBB and finetune to recover the accuracy just like you quantize and finetune the other models. This is the recommended solution.

A2: Quantization-aware training. During the quantization-aware training, instead of constraining the params in a single kernel (e.g., making every param in {-127, -126, .., 126, 127} for int8) for an ordinary conv, you should constrain the equivalent kernel of a DBB (get_equivalent_kernel_bias()).

Q: I tried to finetune your model with multiple GPUs but got an error. Why are the names of params like "xxxx.weight" in the downloaded weight file but sometimes like "module.xxxx.weight" (shown by nn.Module.named_parameters()) in my model?

A: DistributedDataParallel may prefix "module." to the name of params and cause a mismatch when loading weights by name. The simplest solution is to load the weights (model.load_state_dict(...)) before DistributedDataParallel(model). Otherwise, you may insert "module." before the names like this

checkpoint = torch.load(...)    # This is just a name-value dict
ckpt = {('module.' + k) : v for k, v in checkpoint.items()}
model.load_state_dict(ckpt)

Likewise, if the param names in the checkpoint file start with "module." but those in your model do not, you may strip the names like

ckpt = {k.replace('module.', ''):v for k,v in checkpoint.items()}   # strip the names
model.load_state_dict(ckpt)

Q: So a DBB derives the equivalent KxK kernels before each forwarding to save computations?

A: No! More precisely, we do the conversion only once right after training. Then the training-time model can be discarded, and every resultant block is just a KxK conv. We only save and use the resultant model.

Contact

[email protected]

Google Scholar Profile: https://scholar.google.com/citations?user=CIjw0KoAAAAJ&hl=en

My open-sourced papers and repos:

Simple and powerful VGG-style ConvNet architecture (preprint, 2021): RepVGG: Making VGG-style ConvNets Great Again (https://github.com/DingXiaoH/RepVGG)

State-of-the-art channel pruning (preprint, 2020): Lossless CNN Channel Pruning via Decoupling Remembering and Forgetting (https://github.com/DingXiaoH/ResRep)

CNN component (ICCV 2019): ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks (https://github.com/DingXiaoH/ACNet)

Channel pruning (CVPR 2019): Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure (https://github.com/DingXiaoH/Centripetal-SGD)

Channel pruning (ICML 2019): Approximated Oracle Filter Pruning for Destructive CNN Width Optimization (https://github.com/DingXiaoH/AOFP)

Unstructured pruning (NeurIPS 2019): Global Sparse Momentum SGD for Pruning Very Deep Neural Networks (https://github.com/DingXiaoH/GSM-SGD)

Comments
  • 关于TRANS Ⅲ中的注意事项

    关于TRANS Ⅲ中的注意事项

    您好!不好意思打扰您!!我最近拜读您的论文,看到了TRANS Ⅲ,不得不说这种变换确实很是新颖,但看到其中您提到的注意点,即如果第二层K*K 如果对输入做了0填充,那么公式8是不成立的,解决方案是用第一次等价过来的卷积的偏置 REP(b1) 作为填充,对这一点我有点不太理解,您能给详细解释一下不成立的原因以及解决方案的原因么?谢谢您!

    opened by Orangerccc 10
  • 转换模型

    转换模型

    您好, 我在谷歌云盘/百度云下载模型时, 发现resnet18是一个文件夹, 文件夹内没有模型, resnet50有对应的模型, 但是在用convert.py进行转换时,第27行train_model.load_state_dict(ckpt)报错 ,会出现不匹配的key,报错信息如下(部分省略): RuntimeError: Error(s) in loading state_dict for ResNet:
    Missing key(s) in state_dict: "stage1.0.conv2.dbb_avg.bn.bn.weight", "stage1.0.conv2.dbb_avg.bn.bn.bias", "stage1.0.conv2.dbb_avg.bn.bn.running_mean", "stage1.0.conv2.dbb_avg.bn.bn.running_var", " stage1.0.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.0.conv2.dbb_1x1_kxk.bn1.bn.running_var", "stage1.1.conv2.dbb_avg.bn.bn.weight", "stage1.1.conv2.dbb_avg.bn.bn.bias", "stage1.1.conv2.dbb_avg.bn.bn.running_mean", "stage1.1.conv2.dbb_avg.bn.bn.running_var", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.1.conv2.dbb_1x1_kxk.bn1.bn.running_var", "stage1.2.conv2.dbb_avg.bn.bn.weight", "stage1.2.conv2.dbb_avg.bn.bn.bias", "stage1.2.conv2.dbb_avg.bn.bn.running_mean", "stage1.2.conv2.dbb_avg.bn.bn.running_var", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.weight", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.bias", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.running_mean", "stage1.2.conv2.dbb_1x1_kxk.bn1.bn.running_var"........

    Unexpected key(s) in state_dict: "stage1.0.conv2.dbb_avg.bn.weight", "stage1.0.conv2.dbb_avg.bn.bias", "stage1.0.conv2.dbb_avg.bn.running_mean", "stage1.0.conv2.dbb_avg.bn.running_var", "stage1.0conv2.dbb_avg.bn.num_batches_tracked", "stage1.0.conv2.dbb_1x1_kxk.bn1.weight", "stage1.0.conv2.dbb_1x1_kxk.bn1.bias", "stage1.0.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.0.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.0.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage1.1.conv2.dbb_avg.bn.weight", "stage1.1.conv2.dbb_avg.bn.bias", "stage1.1.conv2.dbb_avg.bn.running_mean", "stage1.1.conv2.dbb_avg.bn.runing_var", "stage1.1.conv2.dbb_avg.bn.num_batches_tracked", "stage1.1.conv2.dbb_1x1_kxk.bn1.weight", "stage1.1.conv2.dbb_1x1_kxk.bn1.bias", "stage1.1.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.1.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.1.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage1.2.conv2.dbb_avg.bn.weight", "stage1.2.conv2.dbb_avg.bn.bias", "stage1.2.conv2.dbb_avg.bn.running_mean", "stage1.conv2.dbb_avg.bn.running_var", "stage1.2.conv2.dbb_avg.bn.num_batches_tracked", "stage1.2.conv2.dbb_1x1_kxk.bn1.weight", "stage1.2.conv2.dbb_1x1_kxk.bn1.bias", "stage1.2.conv2.dbb_1x1_kxk.bn1.running_mean", "stage1.2.conv2.dbb_1x1_kxk.bn1.running_var", "stage1.2.conv2.dbb_1x1_kxk.bn1.num_batches_tracked", "stage2.0.conv2.dbb_avg.bn.weight", "stage2.0.conv2.dbb_avg.bn.bias", "stage2.0.conv2.dbb_avg.bn.runing_mean", "stage2.0.conv2.dbb_avg.bn.running_var".......

    opened by dada-thu 3
  • transIII_1x1_kxk does not behave as expected

    transIII_1x1_kxk does not behave as expected

    Hi,

    I verified like this:

    conv1 = nn.Conv2d(32, 64, 1, 1, 0, bias=True)
    conv2 = nn.Conv2d(64, 128, 3, 1, 1, bias=True)
    conv = nn.Conv2d(32, 128, 3, 1, 1, bias=True)
    
    k, b = transIII_1x1_kxk(conv1.weight, conv1.bias, conv2.weight, conv2.bias, 1)
    conv.weight.copy_(k)
    conv.bias.copy_(b)
    inten = torch.randn(2, 32, 224, 224)
    out1 = conv2(conv1(inten))
    out2 = conv(inten)
    print((out1 - out2).abs().max())
    

    And the output is 0.11, which is much too great. Have you noticed this ?

    opened by CoinCheung 2
  • Worse performance with DiverseBlock for Cifar10, ResNet18

    Worse performance with DiverseBlock for Cifar10, ResNet18

    Hello.

    Thank you for your interesting work, and code.

    I tried using your Diverseblock in ResNet18 (according to your instructions, replacing conv+bn with diverse blocks). My code is based on https://github.com/kuangliu/pytorch-cifar. The accuracy drops from 95.4% to 95.1%. Do you have any ideas for why this is?

    Thank you.

    opened by WilhelmT 1
  • 关于DBB替换Res18的多分类表现

    关于DBB替换Res18的多分类表现

    大佬您好,看了您的文章之后,我试着用使用DBB模块的Res18网络用于自己的多分类任务中, 使用方法如下:

    import torch import torch.nn as nn from DiverseBranchBlock.convnet_utils import switch_deploy_flag, switch_conv_bn_impl, build_model

    def Dbb_Res(num_classes,pretrained=True):

    switch_deploy_flag(False)
    switch_conv_bn_impl('DBB')
    model = build_model('ResNet-18')
    
    if pretrained ==True:
    
        model.load_state_dict(torch.load('DiverseBranchBlock\ResNet-18_DBB_7099.pth'))
    
    
    
    in_features = model.linear.in_features
    model.linear = nn.Linear(in_features, num_classes)
    return model
    

    但是在实战中效果却一塌糊涂,预训练res18能达到80%的准确率, 我是用如上方法构建的网络,精度只有6% - .-,请问是我这种方法调用不正确吗,如何调整, 麻烦您了!

    opened by HOLYlmx 0
  • why padding == kernel_size // 2 is asserted?

    why padding == kernel_size // 2 is asserted?

    https://github.com/DingXiaoH/DiverseBranchBlock/blob/be15be76a5556e04b2b44411a69994abcd1f25eb/diversebranchblock.py#L105 Why padding should be equal to kernel // 2? what if Conv2d(kernel_size=4, stride=2, padding=1)?

    opened by PennyPeng369 1
  • Maybe need to reverse `H_pixels_to_pad` & `W_pixels_to_pad`?

    Maybe need to reverse `H_pixels_to_pad` & `W_pixels_to_pad`?

    Hi, I just wonder whether here should be F.pad(kernel, [W_pixels_to_pad, W_pixels_to_pad, H_pixels_to_pad, H_pixels_to_pad]), since the F.pad's padding mode should be set as [padding_left, padding_right, padding_top, padding_bottom https://github.com/DingXiaoH/DiverseBranchBlock/blob/cd627d5089eaa25dedaa258b189fde508586a2f7/dbb_transforms.py#L44

    Best

    opened by CiaoHe 0
  • ValueError: some parameters appear in more than one parameter group

    ValueError: some parameters appear in more than one parameter group

    Hi, when I used DiverseBranchBlock to replace Conv-Bn in my network, I met this error ValueError: some parameters appear in more than one parameter group Have you met it before?

    opened by lidehuihxjz 2
Owner
null
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Unit-Convertor - Unit Convertor Built With Python

Python Unit Converter This project can convert Weigth,length and ... units for y

Mahdis Esmaeelian 1 May 31, 2022
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

FAU Implementation of the paper: Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution. Yingruo

Evelyn 78 Nov 29, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 2, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

Aaron Chen 2.4k Dec 28, 2022
Inflated i3d network with inception backbone, weights transfered from tensorflow

I3D models transfered from Tensorflow to PyTorch This repo contains several scripts that allow to transfer the weights from the tensorflow implementat

Yana 479 Dec 8, 2022
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

null 833 Jan 7, 2023
Only works with the dashboard version / branch of jesse

Jesse optuna Only works with the dashboard version / branch of jesse. The config.yml should be self-explainatory. Installation # install from git pip

Markus K. 8 Dec 4, 2022
API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

null 20 Jan 5, 2023
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the moment, only TensorFlow sequential models are supported. Interfaces to either the Pyomo or Gurobi modeling environments are offered.

ChemEngAI 40 Dec 27, 2022
MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

MAU (NeurIPS2021) Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xinguang Xiang, Wen GAo. Official PyTorch Code for "MAU: A Motion-Aware

ZhengChang 20 Nov 25, 2022
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022