Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

Overview

AdderNet: Do We Really Need Multiplications in Deep Learning?

This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in Deep Learning?

We present adder networks (AdderNets) to trade massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the L1-norm distance between filters and input feature as the output response. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

UPDATE: The training code is released in 6/28.

Run python main.py to train on CIFAR-10.

UPDATE: Model Zoo about AdderNets are released in 11/27.

Classification results on CIFAR-10 and CIFAR-100 datasets.

Model Method CIFAR-10 CIFAR-100
VGG-small ANN [1] 93.72% 74.58%
PKKD ANN [2] 95.03% 76.94%
ResNet-20 ANN 92.02% 67.60%
PKKD ANN 92.96% 69.93%
ShiftAddNet* [3] 89.32%(160epoch) -
ResNet-32 ANN 93.01% 69.17%
PKKD ANN 93.62% 72.41%

Classification results on ImageNet dataset.

Model Method Top-1 Acc Top-5 Acc
ResNet-18 CNN 69.8% 89.1%
ANN [1] 67.0% 87.6%
PKKD ANN [2] 68.8% 88.6%
ResNet-50 CNN 76.2% 92.9%
ANN 74.9% 91.7%
PKKD ANN 76.8% 93.3%

Super-Resolution results on several SR datasets.

Scale Model Method Set5 (PSNR/SSIM) Set14 (PSNR/SSIM) B100 (PSNR/SSIM) Urban100 (PSNR/SSIM)
×2 VDSR CNN 37.53/0.9587 33.03/0.9124 31.90/0.8960 30.76/0.9140
ANN [4] 37.37/0.9575 32.91/0.9112 31.82/0.8947 30.48/0.9099
EDSR CNN 38.11/0.9601 33.92/0.9195 32.32/0.9013 32.93/0.9351
ANN 37.92/0.9589 33.82/0.9183 32.23/0.9000 32.63/0.9309
×3 VDSR CNN 33.66/0.9213 29.77/0.8314 28.82/0.7976 27.14/0.8279
ANN 33.47/0.9151 29.62/0.8276 28.72/0.7953 26.95/0.8189
EDSR CNN 34.65/0.9282 30.52/0.8462 29.25/0.8093 28.80/0.8653
ANN 34.35/0.9212 30.33/0.8420 29.13/0.8068 28.54/0.8555
×4 VDSR CNN 31.35/0.8838 28.01/0.7674 27.29/0.7251 25.18/0.7524
ANN 31.27/0.8762 27.93/0.7630 27.25/0.7229 25.09/0.7445
EDSR CNN 32.46/0.8968 28.80/0.7876 27.71/0.7420 26.64/0.8033
ANN 32.13/0.8864 28.57/0.7800 27.58/0.7368 26.33/0.7874

*ShiftAddNet [3] used different training setting.

[1] AdderNet: Do We Really Need Multiplications in Deep Learning? Hanting Chen, Yunhe Wang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu. CVPR, 2020. (Oral)

[2] Kernel Based Progressive Distillation for Adder Neural Networks. Yixing Xu, Chang Xu, Xinghao Chen, Wei Zhang, Chunjing XU, Yunhe Wang. NeurIPS, 2020. (Spotlight)

[3] ShiftAddNet: A Hardware-Inspired Deep Network. Haoran You, Xiaohan Chen, Yongan Zhang, Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, Yingyan Lin. NeurIPS, 2020.

[4] AdderSR: Towards Energy Efficient Image Super-Resolution. Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chunjing Xu, Dacheng Tao. Arxiv, 2020.

Requirements

  • python 3
  • pytorch >= 1.1.0
  • torchvision

Preparation

You can follow pytorch/examples to prepare the ImageNet data.

The pretrained models are available in google drive or baidu cloud (access code:126b)

Usage

Run python main.py to train on CIFAR-10.

Run python test.py --data_dir 'path/to/imagenet_root/' to evaluate on ImageNet val set. You will achieve 74.9% Top accuracy and 91.7% Top-5 accuracy on the ImageNet dataset using ResNet-50.

Run python test.py --dataset cifar10 --model_dir models/ResNet20-AdderNet.pth --data_dir 'path/to/cifar10_root/' to evaluate on CIFAR-10. You will achieve 91.8% accuracy on the CIFAR-10 dataset using ResNet-20.

The inference and training of AdderNets is slow since the adder filters is implemented without cuda acceleration. You can write cuda to achieve higher inference speed.

Citation

@article{AdderNet,
	title={AdderNet: Do We Really Need Multiplications in Deep Learning?},
	author={Chen, Hanting and Wang, Yunhe and Xu, Chunjing and Shi, Boxin and Xu, Chao and Tian, Qi and Xu, Chang},
	journal={CVPR},
	year={2020}
}

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.

Comments
  • Modify YOLOv3 backbone from DarkNet to AdderNet

    Modify YOLOv3 backbone from DarkNet to AdderNet

    How to correctly modify https://github.com/eriklindernoren/PyTorch-YOLOv3 to use https://github.com/huawei-noah/AdderNet ?

    The following colab ipynb notebook is what I have so far with the helps of others:

    https://colab.research.google.com/drive/1VCafwykgNKAO6144LssBFFy0TmruDNSE#scrollTo=W3e-WcVxnKfs

    How to solve the error on this models.py file ?

    addernet_error

    Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=8, nms_thres=0.5, weights_path='weights/yolov3.weights')
    Traceback (most recent call last):
      File "test.py", line 84, in <module>
        model.load_darknet_weights(opt.weights_path)
      File "/content/PyTorch-YOLOv3/models.py", line 321, in load_darknet_weights
        num_w = conv_layer.weight.numel()
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
        type(self).__name__, name))
    AttributeError: 'adder2d' object has no attribute 'weight'
    
    opened by buttercutter 20
  • RuntimeError: _cdist_backward requires X2 to be contiguous

    RuntimeError: _cdist_backward requires X2 to be contiguous

    Hi, I am trying to train your addernet, but it returns me one runtime error, which I supposed attributes to .continuous() function or some other uncommon operations used in your adder.py.

    Could you help to solve this issue?

    opened by ranery 18
  • Using too much gpu memory while training on addernet with ImageNet?

    Using too much gpu memory while training on addernet with ImageNet?

    I try to train the addernet using resnet18 for ImageNet from scratch, with 4 1080Ti cards, but it just occupies too much memory that i could only set the batch_size to 16, and it's also too too slow..

    For comparision, I have tired to replace the adder filters with normal conv filters and the 4 gpu cards could load 128 batch size. Did i setup wrong, or is that the normal case currently for addernet?

    Have you guys tried to train with ImageNet?

    opened by Tsings04 13
  • About the training accuracy.

    About the training accuracy.

    Hi, Hanting Chen,

    I tried the same training setting (Poly LR schedule with 0.9 power; 400 epochs; 256 batch size), and I can only get 89.16% accuracy when training resnet20 on CIFAR-10, while the reported accuracy is 91.84%.

    Could you provide more training details and help to explain the gap? thanks.

    opened by ranery 8
  • Equation (5) - partial derivative of the Euclidean norm

    Equation (5) - partial derivative of the Euclidean norm

    Hi, I would like to know why you defined the L2-distance as in Equation (14) appendix. Doesn't L2-distance need a square root outside the summations? And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes? Thanks.

    opened by andgitchang 4
  • the speed of adder.adder2d slower than torch.nn.Conv2d ? I'm confused.

    the speed of adder.adder2d slower than torch.nn.Conv2d ? I'm confused.

    I replace adder.adder2d with torch.nn.Conv2d, and replace torch.cdist with my_cdist. train my network, the speed of the new model slower than the old at least 6 times. I'm confused.

    @torch.jit.script
    def my_cdist(x1, x2):
        x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
        x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
        res = torch.addmm(x2_norm.transpose(-2, -1), x1, x2.transpose(-2, -1), alpha=-2).add_(x1_norm)
        res = res.clamp_min_(1e-30).sqrt_()
    
        return res
    
    opened by zccyman 4
  • 您好,我想咨询一下ImageProcessingTransformer的问题

    您好,我想咨询一下ImageProcessingTransformer的问题

    您好,冒昧打扰。 之前拜读了您团队的Pretrained ImageProcessingTransformer, 受益匪浅。在尝试复现您的论文,但是在Dataloader和Datasets的写过程中遇到了问题。不知如何裁剪图片, 如何对数据进行预处理。论文中提到Overlap10Pix, 但是边缘部分如何处理!
    若能分享解惑,不甚感激。 感谢感谢,期待回复。

    opened by jiaaihhy 3
  • Please update the unfair comparison with ShiftAddNet

    Please update the unfair comparison with ShiftAddNet

    Hi,

    We recently noticed that you update the comparison with ShiftAddNet in your readme.

    However, the chosen accuracy of ShiftAddNet (i.e., 85.10%) is wrongly from our adaption experiments, for which we only use half the CIFAR-10 dataset to pre-train. Our accuracy (ResNet-20 on CIFAR-10) should be 89.32%, which is also trained with a different setting and different implementation (e.g., only 160 epochs).

    Could you help to update the comparison results?

    Best regards, Haoran You

    opened by ranery 3
  • 迁移学习:使用自己的数据集对resnet50预训练模型进行微调

    迁移学习:使用自己的数据集对resnet50预训练模型进行微调

    您好 我们参考 https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html 利用自己的数据集对网络进行微调,我们遇到了困难 普通的resnet通过fc层输出in_featrues,进行微调训练,但是addernet中resnet的fc是conv2d,我不知道如何得到fc层的输入,或者说avgpool的输出。 期待您的回答,非常感谢。

    opened by 284513016 3
  • RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

    RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

    First I really appreciate your work . I met some problems when I try to run python main.py. File "main.py", line 117, in main() File "main.py", line 112, in main train_and_test(e) File "main.py", line 105, in train_and_test train(epoch) File "main.py", line 80, in train loss.backward() File "/home/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

    opened by PingLi00 3
  • 关于固定加法核进行加法运算

    关于固定加法核进行加法运算

    你好,我想固定加法核进行加法替代卷积的操作。 我把生成的核 tensor 直接放在 output = adder2d_function(x, self.adder, self.stride, self.padding) 的 self.adder 中,发现效果和固定卷积核操作相差甚远,不知道是因为加法核和卷积核的分布差异导致还是其他原因,想咨询一下作者有固定加法核进行加法操作的方法吗?谢谢

    opened by ldk-97 2
  • Migrating from a Simple CNN Architecture to AdderNet

    Migrating from a Simple CNN Architecture to AdderNet

    Hi. I have a CNN architecture that I trained on CIFAR-10 with and without AdderNet. I could reach an accuracy of over %80 without AdderNet but when I used AdderNet it got stuck at %10 accuracy. Is there anything wrong with my implementation? All I did was to replace nn.conv2D with adder.adder2d. Isn't it supposed to work like this? How do you suggest I should migrate from a simple CNN architecture to AdderNet? Thank you!

    opened by alarst13 1
  • 询问部署到MobileNet上的问题

    询问部署到MobileNet上的问题

    您好,最近我想试试部署adder到mobilenet v2网络里,但是遇到了点问题,请问可以帮我解答吗。

    我用这里的源码(训练参数对齐)跑resnet,精度和report里差不多。但是当我把mobilenet v2的pointwise层(depthwise要改为群卷积,准备等pointwise没问题再改)替换成adder-conv2d后,精度会有2%的下降。

    我没有替换开头和末尾两层。请问精度下降2%会是什么原因呢? 会不会是残差的问题,需要每一层都有残差吗(mobilenet v2若是stride=2或维度变化,就没有残差了)?

    opened by Jerryme-xxm 0
  • 咨询adderSR问题

    咨询adderSR问题

    你好,我在复现的过程中,有几个问题想请教:

    1. 由于论文中提到需要使用shortcut来实现恒等映射,所以网络中是不是没有独立的加法层(每个加法层都包含于一个残差块),因此是否意味着原始EDSR中的一些独立卷积层,都要用一个加法残差块来替代。
    2. 在您的论文中提到:the above function (power activation function) can be easily embedded into the conventional ReLU in any SISR models. 这个具体实现思路是什么?我可以直接在ReLU后面接一个power activation function吗?
    opened by CLC0530 2
Owner
HUAWEI Noah's Ark Lab
Working with and contributing to the open source community in data mining, artificial intelligence, and related fields.
HUAWEI Noah's Ark Lab
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 9, 2021
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 8, 2023
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python>=3.7 pytorch>=1.6.0 torchvision>=0.8

Yunfan Li 210 Dec 30, 2022
Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University 139 Nov 18, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

null 43 Nov 19, 2022
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 8, 2023
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

Ofir Press 138 Apr 15, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022