Count the MACs / FLOPs of your PyTorch model.

Ligeng Zhu

Last update: Dec 29, 2022

Related tags

Deep Learning pytorch-OpCounter

Overview

THOP: PyTorch-OpCounter

How to install

pip install thop (now continously intergrated on Github actions)

pip install --upgrade git+https://github.com/Lyken17/pytorch-OpCounter.git

How to use

Basic usage

from torchvision.models import resnet50
from thop import profile
model = resnet50()
input = torch.randn(1, 3, 224, 224)
macs, params = profile(model, inputs=(input, ))

Define the rule for 3rd party module.

class YourModule(nn.Module):
    # your definition
def count_your_model(model, x, y):
    # your rule here

input = torch.randn(1, 3, 224, 224)
macs, params = profile(model, inputs=(input, ), 
                        custom_ops={YourModule: count_your_model})

Improve the output readability

Call thop.clever_format to give a better format of the output.

from thop import clever_format
macs, params = clever_format([macs, params], "%.3f")

Results of Recent Models

The implementation are adapted from torchvision. Following results can be obtained using benchmark/evaluate_famous_models.py.

Model	Params(M)	MACs(G)
alexnet	61.10	0.77
vgg11	132.86	7.74
vgg11_bn	132.87	7.77
vgg13	133.05	11.44
vgg13_bn	133.05	11.49
vgg16	138.36	15.61
vgg16_bn	138.37	15.66
vgg19	143.67	19.77
vgg19_bn	143.68	19.83
resnet18	11.69	1.82
resnet34	21.80	3.68
resnet50	25.56	4.14
resnet101	44.55	7.87
resnet152	60.19	11.61
wide_resnet101_2	126.89	22.84
wide_resnet50_2	68.88	11.46

Model	Params(M)	MACs(G)
resnext50_32x4d	25.03	4.29
resnext101_32x8d	88.79	16.54
densenet121	7.98	2.90
densenet161	28.68	7.85
densenet169	14.15	3.44
densenet201	20.01	4.39
squeezenet1_0	1.25	0.82
squeezenet1_1	1.24	0.35
mnasnet0_5	2.22	0.14
mnasnet0_75	3.17	0.24
mnasnet1_0	4.38	0.34
mnasnet1_3	6.28	0.53
mobilenet_v2	3.50	0.33
shufflenet_v2_x0_5	1.37	0.05
shufflenet_v2_x1_0	2.28	0.15
shufflenet_v2_x1_5	3.50	0.31
shufflenet_v2_x2_0	7.39	0.60
inception_v3	27.16	5.75

Comments

How to set counter for nn.functional.interpolate?

In my model, nn.functional.interpolate is used, whose computation should be also calculated in my case.

nn.functional.interpolate: count_upsample_bilinear was appended in register_hooks in profile.py, and count_upsample_bilinear was implemented in count_hooks.py, but it did not work.

I also tried to print m_type in add_hooks in profile, but the output did not include nn.functional.interpolate. It seems that we cannot add hooks to some APIs like nn.functional.interpolate which work differently from APIs from torch.nn in terms of parameters and handler. I wonder if we can only wrap them into 3rd party modules to set FLOPs counters for them like the example in README.md.

Is there any workarounds introducing minimum modification to the model definition code?

opened by fengyuentau 9
Is the count_conv2d for FLOPs?

I think the count_conv2d function is for MACC or Multiplications. In this function, total_ops is calculated by K x K x Cin x Wout x Hout X Cout. Isn't it for the MACC calculation?

opened by sungsooo 7
Calculation of trainable parameters is inaccurate due to storage as floating-point variable
I noticed this when testing out the module on a single, large Linear() layer:

layer = torch.nn.Linear(8153, 7533, bias=False) ... flops, params = profile(layer, inputs=(inputs,)) print(f"{flops} FLOPS {params} parameters")

The tool returns:

843934466048.0 FLOPS 61416548.0 parameters

The latter figure is off by 1 from the expected answer of 8153*7533=61,416,589. The fact that it printed out decimal precision was a hint, and pointed me to the following lines:

https://github.com/Lyken17/pytorch-OpCounter/blob/1ede8b613c13808d9f52ce5666a18922972592be/thop/profile.py#L72-L76

Since the above value is greater than 2^(24) = 16,777,216, it may not be perfectly represented in 32-bit floating-point format. Indeed, while p.numel() = 61416549, torch.Tensor([p.numel()]).dtype is torch.float32, so the value gets rounded to tensor([61416548.]).

Total trainable parameters should probably be stored in variables with an explicit dtype=torch.int64.
opened by felker 6

Help me to check my custom op calculation code

This is my code of calculating FLOPs of my custom op(F.grid_sample). But why is still the results zero macs(zero params is reasonable)?

import torch
from thop.profile import profile
import torch.nn as nn
import torch.nn.functional as F

class GridSample(nn.Module):
    def __init__(self) -> None:
        super(GridSample, self).__init__()
    
    def forward(self, x, grid):
        x = F.grid_sample(x, grid, padding_mode='border', align_corners=True)
        return x

def get_macs_of_gridsample(m:GridSample,x,y):
    """
    x is the input of instance of class GridSample
    y is the output of instance of class GridSample
    """
    input, flow = x[0], x[1]
    # print(input.size())
    # print(flow.size())
    batch, C = input.size()[:2]
    Hout, Wout = flow.size()[1:3]
    """compute flops"""
    total_ops = 0
    total_ops += batch * C * Hout * Wout * 4  # bilinear interpolation
    # print(total_ops)
    m.total_ops += torch.DoubleTensor([int(total_ops)])


if __name__ == "__main__":
    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda"

    # print(type(GridSample()))
    model = GridSample().to(device)
    dsize1 = (1, 4, 3, 3)
    dsize2 = (1, 3, 3, 2)
    x = torch.randn(dsize1).to(device)
    grid = torch.rand(dsize2).to(device) * 2 - 1
    custom_ops = {GridSample: get_macs_of_gridsample}
    total_macs, total_params = profile(model, (x, grid), custom_ops=custom_ops, verbose=True)
    print(f'total_ops: ', total_macs)
    print(f'total_params: ', total_params)

results are:

[INFO] Customize rule get_macs_of_gridsample() <class 'main.GridSample'>. total_ops: 0 total_params: 0

opened by laisimiao 5

[WARN] Cannot find rule of module

Hello,thanks for your excellent job . But，there is a warning i don‘t understand. what is the rule of module ? Can you give an example how to use it in individual module.

WARN: [WARN] Cannot find rule for <class 'net.model.DC_blocks'>. Treat it as zero Macs and zero Params. ps(DC_blocks is my model )

opened by peylnog 5
It doesnt work for my resnet18.

when I use it to calculate resnet18's flops,an error occurs:

[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>. [INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>. [INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU6'>. [WARN] Cannot find rule for <class 'torch.nn.modules.container.Sequential'>. Treat it as zero Macs and zero Params. [WARN] Cannot find rule for <class 'main.ResidualBlock'>. Treat it as zero Macs and zero Params. [INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>. [WARN] Cannot find rule for <class 'main.ResNet'>. Treat it as zero Macs and zero Params.

my net architecture is:

ResNet( (conv1): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6() ) (layer1): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (layer2): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (layer3): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (layer4): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (fc): Linear(in_features=512, out_features=10, bias=True) )

I wanna know how to deal with it,very thanks.

opened by JachinMa 5
calculation of AvgPooling

Could you please give a closer explanation for the calculation of the AvgPool FLOPs? Despite I know that the impact of pooling layers is minimal, I'd like to understand the calculation. As I get from the code the formular for 2dAvgPooling you use is: (kernel_size + 1) * H_out * W_out * in_channels * out_channels But if I consider the formular on the pytorch website I'd calculate them as follows: For each kernel beeing applied on your input you have kernel_size * kernel_size - 1 additions plus one division plus one multiplication of your kernel_size inside the denominator. You apply out_channels kernels on in_channels input channels which result in a H_out * W_out output. This is the computation for all multiplications and additions. To be compliant with the regular understanding of MACs and FLOPs you devide by 2 as two MACs result in one FLOP. So finally I get the formular: FLOPs = (kernel_size * kernel_size - 1) * H_out * W_out * in_channels * out_channels / 2

opened by MarkusAmann 5
Restriction on reused modules might be too strict
It seems that the reuse of module is banned by THOP tools

https://github.com/Lyken17/pytorch-OpCounter/blob/1f4ddb7fb51c3b1a49d60708f9b857535e5dc4e1/thop/profile.py#L49-L51

However, it is not quite reasonable. The re-use of ReLU-type module is very common, it can make your codes neater and causes no harm. e.g.

class Model(nn.Module): def __init__(self): super().__init__() self.dropout = nn.Dropout(0.5) self.relu = nn.ReLU() self.lin1 = nn.Linear(4096, 4096) self.lin2 = nn.Linear(4096, 4096) self.lin3 = nn.Linear(4096, 4096) def forward(self, x): output = self.relu(self.lin1(x)) output = self.relu(self.lin2(output)) output = self.dropout(self.relu(self.lin3(output))) ....

Besides, I think the network legality check is not THOP's work to do. This kind of restriction is unnecessary and will cause troublesome Error raise as I can see.
opened by Kylin9511 5
Different result of Resnet50

Hi, thank you so much for your awesome work! I run the following code: from torchvision.models import resnet50 from thop import profile model = resnet50() flops, params = profile(model, input_size=(1, 3, 224,224)) print(flops, params)

But I get the output: 4142713856.0 25557032.0 , which is different from your results in the table of README. And I'am using pytorch 1.0.1. Could you help me explain that? Thank you!

opened by hellojialee 5
When I calculate the flops of Torch.nn.upsample, I found that Hook in the Forward process will override the output of UpSample. Is this BUG or a special design?
When I calculate the flops of torch.nn.upsample, I found that hook function for Upsample in the forward process will override the output of UpSample. Is this BUG or a special design?

my Upsample:

nn.Upsample(size=None, scale_factor=2, mode='nearest')

This is the code that handles forward_hook in the pytorch framework:

result = forward_call(*input, **kwargs) if _global_forward_hooks or self._forward_hooks: for hook in (*_global_forward_hooks.values(), *self._forward_hooks.values()): hook_result = hook(self, input, result) if hook_result is not None: result = hook_result

It can be seen that if hook_function returns a value, the output of the Upsample module will be overwritten.

In the hook function of the UpSample, the return statement is called, returned 0:

def count_upsample(m, x, y): if m.mode not in ( "nearest", "linear", "bilinear", "bicubic", ): # "trilinear" logging.warning("mode %s is not implemented yet, take it a zero op" % m.mode) return counter_zero_ops() # In my case, the following statement is executed if m.mode == "nearest": return counter_zero_ops() x = x[0] m.total_ops += counter_upsample(m.mode, y.nelement()) def counter_zero_ops(): return torch.DoubleTensor([int(0)])

version Information: thops: 0.0.5-2204160952 pytorch: 1.10.2

Therefore, in the current version of thop, calculating the FLOPs of a Upsample module destroys the original output value of the module. Of course, I'm not sure if this is a special setting.
opened by tty0013 4

Unexpected "Cannot find rule"

Hi, I encountered an unexpected issue with inception_v3 model inside torchvision.

Hereby I give the MRE

>>> import torchvision
>>> import torch
>>> from thop import profile                                                                                       
>>> model = torchvision.models.inception_v3()                                                                                                                                                                                                                              
>>> inputs = torch.Tensor(1,3,224,224)                                                                                                                                                                                                                                     
>>> macs, params = profile(model, inputs=(inputs,))                                                                                                                                 
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>.
[WARN] Cannot find rule for <class 'torchvision.models.inception.BasicConv2d'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionA'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionB'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionC'>. Treat it as zero Macs and zero Params.                                                                                                                                                      
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionAux'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionD'>. Treat it as zero Macs and zero Params.
[WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionE'>. Treat it as zero Macs and zero Params.                                                                                                                                                      
[WARN] Cannot find rule for <class 'torchvision.models.inception.Inception3'>. Treat it as zero Macs and zero Params.
>>> macs
2847217792.0  # while it should be 5.75 G as indicated in README.md

The model definition is as follows:

class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x, inplace=True)

This should not happen as “BasicConv2d” is inherited from nn.Module and is thus "legal". I suppose that it was a corner case that I have encountered ?

opened by MARMOTatZJU 4

Does it support non-image models?

Hi everyone,

I just wondering is it able to calculate the numbers of parameters or Flops for NLP tasks model.

One of the examples: https://github.com/hangyang-nlp/de-ppn

opened by SingCheng 1
How do I calculate the FLOPs of a model with some frozen layers during training?

Thank you so much for sharing the code.

As far as I know, frozen layers have no effect on FLOPs in the forward pass. That is, even if frozen layers are included, FLOPs is only affected by the total number of parameters.

So, how do estimate the FLOPs in the backward pass when there are some frozen layers im the model? Is it correct to simply calculate 2 * forward_FLOPs?

I wonder if this code reflects my question. If not, can someone please help me?

opened by js-lee-AI 0
AttributeError: 'PReLU' object has no attribute 'total_params'
I get it when i run yolov5 with PReLu activation,so i try to write a demo to reporduce it ‘import torch import torch.nn as nn import thop

class test_act(nn.Module): default_act = nn.PReLU() # default activation def init(self): super().init() self.act = self.default_act

def forward(self, x): return self.act(x)

class C3(nn.Module): # CSP Bottleneck with 3 convolutions def init(self,): # ch_in, ch_out, number, shortcut, groups, expansion super().init() self.cv1 = test_act() self.cv2 = test_act()

def forward(self, x): return self.cv2(self.cv1(x))

class BaseA(nn.Module): # YOLOv5 base model def init(self): super().init() layers = [] for i in range(100): m_ = nn.Sequential((C3())) layers.append(m_) self.model = nn.Sequential(*layers)

def forward(self, x, profile=False): for m in self.model: o = thop.profile(m, inputs=(x), verbose=False) x = m(x) return x

model=BaseA().to(1)

im = torch.rand(1, 3, 640, 640).to(1)

model(im, profile=True)’
opened by colchicinewf 0
[discuss] How about set profile's verbose's default value to False?
Now,

def profile( model: nn.Module, inputs, custom_ops=None, verbose=True, ret_layer_info=False, report_missing=False, ):

How about set verbose = False to reduce some unnecessary output?
opened by Freed-Wu 0

Count the MACs / FLOPs of your PyTorch model.

Related tags

Overview

THOP: PyTorch-OpCounter

How to install

How to use

Results of Recent Models

Comments

Owner

Ligeng Zhu

torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

Multiple custom object count and detection using YOLOv3-Tiny method

Count GitHub Stars ⭐

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

Model search is a framework that implements AutoML algorithms for model architecture search at scale

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

Model Zoo for AI Model Efficiency Toolkit

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car