Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

HUAWEI Noah's Ark Lab

Last update: Jan 1, 2023

Related tags

Overview

AdderNet: Do We Really Need Multiplications in Deep Learning?

This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in Deep Learning?

We present adder networks (AdderNets) to trade massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the L1-norm distance between filters and input feature as the output response. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

UPDATE: The training code is released in 6/28.

Run python main.py to train on CIFAR-10.

UPDATE: Model Zoo about AdderNets are released in 11/27.

Classification results on CIFAR-10 and CIFAR-100 datasets.

Model	Method	CIFAR-10	CIFAR-100
VGG-small	ANN [1]	93.72%	74.58%
	PKKD ANN [2]	95.03%	76.94%

ResNet-20	ANN	92.02%	67.60%
	PKKD ANN	92.96%	69.93%
	ShiftAddNet* [3]	89.32%(160epoch)	-

ResNet-32	ANN	93.01%	69.17%
	PKKD ANN	93.62%	72.41%

Classification results on ImageNet dataset.

Model	Method	Top-1 Acc	Top-5 Acc
ResNet-18	CNN	69.8%	89.1%
	ANN [1]	67.0%	87.6%
	PKKD ANN [2]	68.8%	88.6%

ResNet-50	CNN	76.2%	92.9%
	ANN	74.9%	91.7%
	PKKD ANN	76.8%	93.3%

Super-Resolution results on several SR datasets.

Scale	Model	Method	Set5 (PSNR/SSIM)	Set14 (PSNR/SSIM)	B100 (PSNR/SSIM)	Urban100 (PSNR/SSIM)
×2	VDSR	CNN	37.53/0.9587	33.03/0.9124	31.90/0.8960	30.76/0.9140
		ANN [4]	37.37/0.9575	32.91/0.9112	31.82/0.8947	30.48/0.9099
	EDSR	CNN	38.11/0.9601	33.92/0.9195	32.32/0.9013	32.93/0.9351
		ANN	37.92/0.9589	33.82/0.9183	32.23/0.9000	32.63/0.9309
×3	VDSR	CNN	33.66/0.9213	29.77/0.8314	28.82/0.7976	27.14/0.8279
		ANN	33.47/0.9151	29.62/0.8276	28.72/0.7953	26.95/0.8189
	EDSR	CNN	34.65/0.9282	30.52/0.8462	29.25/0.8093	28.80/0.8653
		ANN	34.35/0.9212	30.33/0.8420	29.13/0.8068	28.54/0.8555
×4	VDSR	CNN	31.35/0.8838	28.01/0.7674	27.29/0.7251	25.18/0.7524
		ANN	31.27/0.8762	27.93/0.7630	27.25/0.7229	25.09/0.7445
	EDSR	CNN	32.46/0.8968	28.80/0.7876	27.71/0.7420	26.64/0.8033
		ANN	32.13/0.8864	28.57/0.7800	27.58/0.7368	26.33/0.7874

*ShiftAddNet [3] used different training setting.

[1] AdderNet: Do We Really Need Multiplications in Deep Learning? Hanting Chen, Yunhe Wang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu. CVPR, 2020. (Oral)

[2] Kernel Based Progressive Distillation for Adder Neural Networks. Yixing Xu, Chang Xu, Xinghao Chen, Wei Zhang, Chunjing XU, Yunhe Wang. NeurIPS, 2020. (Spotlight)

[3] ShiftAddNet: A Hardware-Inspired Deep Network. Haoran You, Xiaohan Chen, Yongan Zhang, Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, Yingyan Lin. NeurIPS, 2020.

[4] AdderSR: Towards Energy Efficient Image Super-Resolution. Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chunjing Xu, Dacheng Tao. Arxiv, 2020.

Requirements

python 3
pytorch >= 1.1.0
torchvision

Preparation

You can follow pytorch/examples to prepare the ImageNet data.

The pretrained models are available in google drive or baidu cloud (access code:126b)

Usage

Run python main.py to train on CIFAR-10.

Run python test.py --data_dir 'path/to/imagenet_root/' to evaluate on ImageNet val set. You will achieve 74.9% Top accuracy and 91.7% Top-5 accuracy on the ImageNet dataset using ResNet-50.

Run python test.py --dataset cifar10 --model_dir models/ResNet20-AdderNet.pth --data_dir 'path/to/cifar10_root/' to evaluate on CIFAR-10. You will achieve 91.8% accuracy on the CIFAR-10 dataset using ResNet-20.

The inference and training of AdderNets is slow since the adder filters is implemented without cuda acceleration. You can write cuda to achieve higher inference speed.

Citation

@article{AdderNet,
	title={AdderNet: Do We Really Need Multiplications in Deep Learning?},
	author={Chen, Hanting and Wang, Yunhe and Xu, Chunjing and Shi, Boxin and Xu, Chao and Tian, Qi and Xu, Chang},
	journal={CVPR},
	year={2020}
}

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.

Comments

Modify YOLOv3 backbone from DarkNet to AdderNet

How to correctly modify https://github.com/eriklindernoren/PyTorch-YOLOv3 to use https://github.com/huawei-noah/AdderNet ?

The following colab ipynb notebook is what I have so far with the helps of others:

https://colab.research.google.com/drive/1VCafwykgNKAO6144LssBFFy0TmruDNSE#scrollTo=W3e-WcVxnKfs

How to solve the error on this models.py file ?

addernet_error

Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=8, nms_thres=0.5, weights_path='weights/yolov3.weights')
Traceback (most recent call last):
  File "test.py", line 84, in <module>
    model.load_darknet_weights(opt.weights_path)
  File "/content/PyTorch-YOLOv3/models.py", line 321, in load_darknet_weights
    num_w = conv_layer.weight.numel()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'adder2d' object has no attribute 'weight'

opened by buttercutter 20

RuntimeError: _cdist_backward requires X2 to be contiguous

Hi, I am trying to train your addernet, but it returns me one runtime error, which I supposed attributes to .continuous() function or some other uncommon operations used in your adder.py.

Could you help to solve this issue?

opened by ranery 18
Using too much gpu memory while training on addernet with ImageNet?

I try to train the addernet using resnet18 for ImageNet from scratch, with 4 1080Ti cards, but it just occupies too much memory that i could only set the batch_size to 16, and it's also too too slow..

For comparision, I have tired to replace the adder filters with normal conv filters and the 4 gpu cards could load 128 batch size. Did i setup wrong, or is that the normal case currently for addernet?

Have you guys tried to train with ImageNet?

opened by Tsings04 13
About the training accuracy.

Hi, Hanting Chen,

I tried the same training setting (Poly LR schedule with 0.9 power; 400 epochs; 256 batch size), and I can only get 89.16% accuracy when training resnet20 on CIFAR-10, while the reported accuracy is 91.84%.

Could you provide more training details and help to explain the gap? thanks.

opened by ranery 8
Equation (5) - partial derivative of the Euclidean norm

Hi, I would like to know why you defined the L2-distance as in Equation (14) appendix. Doesn't L2-distance need a square root outside the summations? And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes? Thanks.

opened by andgitchang 4

the speed of adder.adder2d slower than torch.nn.Conv2d ? I'm confused.

I replace adder.adder2d with torch.nn.Conv2d, and replace torch.cdist with my_cdist. train my network, the speed of the new model slower than the old at least 6 times. I'm confused.

@torch.jit.script
def my_cdist(x1, x2):
    x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
    x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
    res = torch.addmm(x2_norm.transpose(-2, -1), x1, x2.transpose(-2, -1), alpha=-2).add_(x1_norm)
    res = res.clamp_min_(1e-30).sqrt_()

    return res

opened by zccyman 4

您好，我想咨询一下ImageProcessingTransformer的问题

您好，冒昧打扰。之前拜读了您团队的Pretrained ImageProcessingTransformer，受益匪浅。在尝试复现您的论文，但是在Dataloader和Datasets的写过程中遇到了问题。不知如何裁剪图片，如何对数据进行预处理。论文中提到Overlap10Pix，但是边缘部分如何处理！
若能分享解惑，不甚感激。感谢感谢，期待回复。

opened by jiaaihhy 3
Please update the unfair comparison with ShiftAddNet

Hi,

We recently noticed that you update the comparison with ShiftAddNet in your readme.

However, the chosen accuracy of ShiftAddNet (i.e., 85.10%) is wrongly from our adaption experiments, for which we only use half the CIFAR-10 dataset to pre-train. Our accuracy (ResNet-20 on CIFAR-10) should be 89.32%, which is also trained with a different setting and different implementation (e.g., only 160 epochs).

Could you help to update the comparison results?

Best regards, Haoran You

opened by ranery 3
迁移学习：使用自己的数据集对resnet50预训练模型进行微调

您好我们参考 https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html 利用自己的数据集对网络进行微调，我们遇到了困难普通的resnet通过fc层输出in_featrues，进行微调训练，但是addernet中resnet的fc是conv2d，我不知道如何得到fc层的输入，或者说avgpool的输出。期待您的回答，非常感谢。

opened by 284513016 3
RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

First I really appreciate your work . I met some problems when I try to run python main.py. File "main.py", line 117, in main() File "main.py", line 112, in main train_and_test(e) File "main.py", line 105, in train_and_test train(epoch) File "main.py", line 80, in train loss.backward() File "/home/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

opened by PingLi00 3
关于固定加法核进行加法运算

你好，我想固定加法核进行加法替代卷积的操作。我把生成的核 tensor 直接放在 output = adder2d_function(x, self.adder, self.stride, self.padding) 的 self.adder 中，发现效果和固定卷积核操作相差甚远，不知道是因为加法核和卷积核的分布差异导致还是其他原因，想咨询一下作者有固定加法核进行加法操作的方法吗？谢谢

opened by ldk-97 2
Migrating from a Simple CNN Architecture to AdderNet

Hi. I have a CNN architecture that I trained on CIFAR-10 with and without AdderNet. I could reach an accuracy of over %80 without AdderNet but when I used AdderNet it got stuck at %10 accuracy. Is there anything wrong with my implementation? All I did was to replace nn.conv2D with adder.adder2d. Isn't it supposed to work like this? How do you suggest I should migrate from a simple CNN architecture to AdderNet? Thank you!

opened by alarst13 1
询问部署到MobileNet上的问题

您好，最近我想试试部署adder到mobilenet v2网络里，但是遇到了点问题，请问可以帮我解答吗。

我用这里的源码（训练参数对齐）跑resnet，精度和report里差不多。但是当我把mobilenet v2的pointwise层（depthwise要改为群卷积，准备等pointwise没问题再改）替换成adder-conv2d后，精度会有2%的下降。

我没有替换开头和末尾两层。请问精度下降2%会是什么原因呢？会不会是残差的问题，需要每一层都有残差吗（mobilenet v2若是stride=2或维度变化，就没有残差了）？

opened by Jerryme-xxm 0
咨询adderSR问题
你好，我在复现的过程中，有几个问题想请教：

由于论文中提到需要使用shortcut来实现恒等映射，所以网络中是不是没有独立的加法层（每个加法层都包含于一个残差块）,因此是否意味着原始EDSR中的一些独立卷积层，都要用一个加法残差块来替代。

在您的论文中提到：the above function （power activation function） can be easily embedded into the conventional ReLU in any SISR models. 这个具体实现思路是什么？我可以直接在ReLU后面接一个power activation function吗？
opened by CLC0530 2

Owner

HUAWEI Noah's Ark Lab

Working with and contributing to the open source community in data mining, artificial intelligence, and related fields.

GitHub

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow

18 Oct 6, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

29 Dec 28, 2022

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

73 Nov 6, 2022

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

1 Oct 26, 2021

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 9, 2021

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

138 Dec 12, 2022

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

64 Nov 11, 2022

Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

203 Jan 8, 2023

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python>=3.7 pytorch>=1.6.0 torchvision>=0.8

210 Dec 30, 2022

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University

139 Nov 18, 2022

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

Related tags

Overview

AdderNet: Do We Really Need Multiplications in Deep Learning?

UPDATE: The training code is released in 6/28.

UPDATE: Model Zoo about AdderNets are released in 11/27.

Requirements

Preparation

Usage

Citation

Contributing

Comments

Owner

HUAWEI Noah's Ark Lab

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

Official TensorFlow code for the forthcoming paper

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Code for the paper Learning the Predictability of the Future

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Code for our CVPR 2021 paper "MetaCam+DSCE"