Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

Zhuang Liu

Last update: Jan 3, 2023

Related tags

Deep Learning deep-learning

Overview

Densely Connected Convolutional Networks (DenseNets)

This repository contains the code for DenseNet introduced in the following paper

Densely Connected Convolutional Networks (CVPR 2017, Best Paper Award)

Gao Huang*, Zhuang Liu*, Laurens van der Maaten and Kilian Weinberger (* Authors contributed equally).

and its journal version

Convolutional Networks with Dense Connectivity (TPAMI 2019)

Gao Huang, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten and Kilian Weinberger.

Now with memory-efficient implementation! Please check the technical report and code for more infomation.

The code is built on fb.resnet.torch.

Citation

If you find DenseNet useful in your research, please consider citing:

@article{huang2019convolutional,
 title={Convolutional Networks with Dense Connectivity},
 author={Huang, Gao and Liu, Zhuang and Pleiss, Geoff and Van Der Maaten, Laurens and Weinberger, Kilian},
 journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
 year={2019}
 }
 
@inproceedings{huang2017densely,
  title={Densely Connected Convolutional Networks},
  author={Huang, Gao and Liu, Zhuang and van der Maaten, Laurens and Weinberger, Kilian Q },
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2017}
}

Other Implementations

Our [Caffe], Our memory-efficient [Caffe], Our memory-efficient [PyTorch], [PyTorch] by Andreas Veit, [PyTorch] by Brandon Amos, [PyTorch] by Federico Baldassarre, [MXNet] by Nicatio, [MXNet] by Xiong Lin, [MXNet] by miraclewkf, [Tensorflow] by Yixuan Li, [Tensorflow] by Laurent Mazare, [Tensorflow] by Illarion Khlestov, [Lasagne] by Jan Schlüter, [Keras] by tdeboissiere,
[Keras] by Roberto de Moura Estevão Filho, [Keras] by Somshubra Majumdar, [Chainer] by Toshinori Hanya, [Chainer] by Yasunori Kudo, [Torch 3D-DenseNet] by Barry Kui, [Keras] by Christopher Masch, [Tensorflow2] by Gaston Rios and Ulises Jeremias Cornejo Fandos.

Note that we only listed some early implementations here. If you would like to add yours, please submit a pull request.

Some Following up Projects

Introduction
Usage
Results on CIFAR
Results on ImageNet and Pretrained Models
Updates

Introduction

DenseNet is a network architecture where each layer is directly connected to every other layer in a feed-forward fashion (within each dense block). For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. This connectivity pattern yields state-of-the-art accuracies on CIFAR10/100 (with or without data augmentation) and SVHN. On the large scale ILSVRC 2012 (ImageNet) dataset, DenseNet achieves a similar accuracy as ResNet, but using less than half the amount of parameters and roughly half the number of FLOPs.

Figure 1: A dense block with 5 layers and growth rate 4.

Figure 2: A deep DenseNet with three dense blocks.

Usage

Install Torch and required dependencies like cuDNN. See the instructions here for a step-by-step guide.
Clone this repo: git clone https://github.com/liuzhuang13/DenseNet.git

As an example, the following command trains a DenseNet-BC with depth L=100 and growth rate k=12 on CIFAR-10:

th main.lua -netType densenet -dataset cifar10 -batchSize 64 -nEpochs 300 -depth 100 -growthRate 12

As another example, the following command trains a DenseNet-BC with depth L=121 and growth rate k=32 on ImageNet:

th main.lua -netType densenet -dataset imagenet -data [dataFolder] -batchSize 256 -nEpochs 90 -depth 121 -growthRate 32 -nGPU 4 -nThreads 16 -optMemory 3

Please refer to fb.resnet.torch for data preparation.

DenseNet and DenseNet-BC

By default, the code runs with the DenseNet-BC architecture, which has 1x1 convolutional bottleneck layers, and compresses the number of channels at each transition layer by 0.5. To run with the original DenseNet, simply use the options -bottleneck false and -reduction 1

Memory efficient implementation (newly added feature on June 6, 2017)

There is an option -optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function (with small modifications from here). There are two extreme memory efficient modes (-optMemory 3 or -optMemory 4) which use a customized densely connected layer. With -optMemory 4, the largest 190-layer DenseNet-BC on CIFAR can be trained on a single NVIDIA TitanX GPU (uses 8.3G of 12G) instead of fully using four GPUs with the standard (recursive concatenation) implementation .

More details about the memory efficient implementation are discussed here.

Results on CIFAR

The table below shows the results of DenseNets on CIFAR datasets. The "+" mark at the end denotes for standard data augmentation (random crop after zero-padding, and horizontal flip). For a DenseNet model, L denotes its depth and k denotes its growth rate. On CIFAR-10 and CIFAR-100 without data augmentation, a Dropout layer with drop rate 0.2 is introduced after each convolutional layer except the very first one.

Model	Parameters	CIFAR-10	CIFAR-10+	CIFAR-100	CIFAR-100+
DenseNet (L=40, k=12)	1.0M	7.00	5.24	27.55	24.42
DenseNet (L=100, k=12)	7.0M	5.77	4.10	23.79	20.20
DenseNet (L=100, k=24)	27.2M	5.83	3.74	23.42	19.25
DenseNet-BC (L=100, k=12)	0.8M	5.92	4.51	24.15	22.27
DenseNet-BC (L=250, k=24)	15.3M	5.19	3.62	19.64	17.60
DenseNet-BC (L=190, k=40)	25.6M	-	3.46	-	17.18

Results on ImageNet and Pretrained Models

Torch

Models in the original paper

The Torch models are trained under the same setting as in fb.resnet.torch. The error rates shown are 224x224 1-crop test errors.

Network	Top-1 error	Torch Model
DenseNet-121 (k=32)	25.0	Download (64.5MB)
DenseNet-169 (k=32)	23.6	Download (114.4MB)
DenseNet-201 (k=32)	22.5	Download (161.8MB)
DenseNet-161 (k=48)	22.2	Download (230.8MB)

Models in the tech report

More accurate models trained with the memory efficient implementation in the technical report.

Network	Top-1 error	Torch Model
DenseNet-264 (k=32)	22.1	Download (256MB)
DenseNet-232 (k=48)	21.2	Download (426MB)
DenseNet-cosine-264 (k=32)	21.6	Download (256MB)
DenseNet-cosine-264 (k=48)	20.4	Download (557MB)

Caffe

https://github.com/shicai/DenseNet-Caffe.

PyTorch

PyTorch documentation on models. We would like to thank @gpleiss for this nice work in PyTorch.

Keras, Tensorflow and Theano

https://github.com/flyyufelix/DenseNet-Keras.

MXNet

https://github.com/miraclewkf/DenseNet.

Wide-DenseNet for better Time/Accuracy and Memory/Accuracy Tradeoff

If you use DenseNet as a model in your learning task, to reduce the memory and time consumption, we recommend use a wide and shallow DenseNet, following the strategy of wide residual networks. To obtain a wide DenseNet we set the depth to be smaller (e.g., L=40) and the growthRate to be larger (e.g., k=48).

We test a set of Wide-DenseNet-BCs and compared the memory and time with the DenseNet-BC (L=100, k=12) shown above. We obtained the statistics using a single TITAN X card, with batch size 64, and without any memory optimization.

Model	Parameters	CIFAR-10+	CIFAR-100+	Time per Iteration	Memory
DenseNet-BC (L=100, k=12)	0.8M	4.51	22.27	0.156s	5452MB
Wide-DenseNet-BC (L=40, k=36)	1.5M	4.58	22.30	0.130s	4008MB
Wide-DenseNet-BC (L=40, k=48)	2.7M	3.99	20.29	0.165s	5245MB
Wide-DenseNet-BC (L=40, k=60)	4.3M	4.01	19.99	0.223s	6508MB

Obersevations:

Wide-DenseNet-BC (L=40, k=36) uses less memory/time while achieves about the same accuracy as DenseNet-BC (L=100, k=12).
Wide-DenseNet-BC (L=40, k=48) uses about the same memory/time as DenseNet-BC (L=100, k=12), while is much more accurate.

Thus, for practical use, we suggest picking one model from those Wide-DenseNet-BCs.

Updates

12/10/2019:

Journal version (accepted by IEEE TPAMI) released.

08/23/2017:

Add supporting code, so one can simply git clone and run.

06/06/2017:

Support ultra memory efficient training of DenseNet with customized densely connected layer.
Support memory efficient training of DenseNet with standard densely connected layer (recursive concatenation) by fixing the shareGradInput function.

05/17/2017:

Add Wide-DenseNet.
Add keras, tf, theano link for pretrained models.

04/20/2017:

Add usage of models in PyTorch.

03/29/2017:

Add the code for imagenet training.

12/03/2016:

Add Imagenet results and pretrained models.
Add DenseNet-BC structures.

Contact

liuzhuangthu at gmail.com
gaohuang at tsinghua.edu.cn
Any discussions, suggestions and questions are welcome!

Comments

ImageNet test

I'm trying to make DenseNet for ImageNet dataset. But, it doesn't converge well. Have you ever try DenseNet to ImageNet dataset? Please share it if you have any successful densenet network for imagenet.

opened by dccho 17
Parameters and computation

Hi there and great work! I've actually also figured out the very same concept myself prior to finding out you guys have already tested and published it. ✍(◔◡◔) Some of the design decisions I've made were different, so I'd like to compare.

Where you're reporting results on Cifars, if you could also add the number of parameters you are using and, possibly, an estimated amount of computation, that would be highly beneficial. It's really necessary for serious comparisons and ability to perfect even this very architecture. Also, if you could add your training logs that would also be of great insight.

As for how to measure the amount of computation, that's quite a tough thing to do, so I'd recommend to at least measure training time, which is a very inexact measure, but, well, provides at least some insights.

I've had 19.5% on Cifar-100+ with mean and std not adjusted (whole dataset just scaled to [0..1] values) with 24m params and forward+backwards running for 220 sec / epoch on GTX Titan X with the best dense-type architecture that I designed prior (I could only experiment on a single GTX Titan X, don't really have a lot of computational resources). It didn't have preactivation. It would most likely at least match the results those you've published for DenseNet (L=100, k=24) CIFAR-100+ if I used the right dataset (with std and mean adjusted). My code https://github.com/ibmua/Breaking-Cifar/blob/master/models/hoard-2-x.lua (uses 4-spaced tabs. To achieve that result I used depth=2 sequences=2 , here's a log of the end of training https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt ). Mind that I used groups, which are only accessible via Soumith's "cudnn", so if you'll want to try this you probably want to clone the whole thing. Also, not that I didn't use any Droupout (haven't even tried)

opened by ibmua 15
I tried to reproduce Wide-DenseNet-BC results on cifar10, but got 0.5% more than your error

I tested Wide-DenseNet-BC (L=40, k=48) on CIFAR-10 augmentation, see https://github.com/seasonyc/densenet/blob/bf99d7f459ca7754c37ff58c6610eb76e93f7990/cifar10-test.py#L217 in https://github.com/seasonyc/densenet but could only get 4.5% error rate.

I tried to tune some hyper parameters, e.g. dropout, weight decay, learning rate... but always couldn't get better result. Now I am testing to follow the lr decay of wide resnet training, i.e. initial 0.1, by 0.2 per 60 epochs, but I very suspect if it will take effect...

Would you like to give me any suggestion for it?

Thanks YC

opened by seasonyc 4
What is proper way of counting parameters?
Hi author, as you claimed in both repo and paper, the numer of parameters of densenet-100-12 is 7.0M and densenet-100-24 is 27.72M. However when I examine the parameters in following way

-- main.lua, line 32 -- Create model local model, criterion = models.setup(opt, checkpoint) params = model:getParameters() print(#params)

I got 4.06M for densenet-100-12 and 16.11M for densenet-100-24. Did I count it in a wrong way?
opened by Lyken17 4
Memory efficient implementation of Caffe
Hi, I saw this caffe implementation which is memory efficient. https://github.com/Tongcheng/DN_CaffeScript

And I also notice this in wiki

Memory efficient implementation (newly added feature on June 6, 2017) There is an option -optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function ....

Does that caffe use the above memory efficient way to implementation?

Thanks.
opened by haikuoyao 4
DenseNet architecture question

I may be misunderstanding the architecture, but why does DenseNet decide to concatenate feature maps from the current layer to pass backward instead of using "true" residual connections?

opened by suryabhupa 4
Purpose using first convolution

In your network architecture for CIFAR and imagenet dataset, what does purpose use the first convolution (before pooling-denseblock1)? In the imagenet, you use two convolutions block before entering the dense-block, while CIFAR just one, any reason? Thanks

opened by John1231983 3
Wide-DenseNet

Congrats for best paper on CVPR 2017! I'm troubled with the memory problem with densenet. Would you share your wide-densenet implementation and pre-train models publicly?

Best!

opened by 1292765944 3
Deep-Narrow DenseNet

I was wondering if you ever tried the extreme case growth_rate = 1 with a very deep network? Just as an exercise I implemented a fully-connected dense block with growth_rate = 1 and depth = 50 on a 2D dataset so I could visualize what each neuron was learning, the results where very nice.

opened by cgarciae 3
Convolution after ReLU in Dense Layer Question

I've seen that you use:

BN -> ReLU -> Conv3x3 -> Dropout

on the normal case, or

BN -> ReLU -> Conv1x1 -> Dropout -> BN -> ReLU -> Conv3x3 -> Dropout

when using bottleneck. The question is why? Most networks use e.g.

Conv3x3 -> BN -> ReLU -> Dropout

Why did you invert the order? Did you get better results this way?

Thanks in advance!

opened by cgarciae 3
About a tensorflow implementation

I've followed one of Tensorflow implementations of DenseNet (https://github.com/ikhlestov/vision_networks) to reproduce DenseNet-BC-100-12. It seemed to me that the tensorflow implementation is nearly equivalent with one from this repo, but I couldn't reach to ~4.5 % error (the best one was about ~4.8 %, by the way) Could you give me any reasons why it is? I already compared two codes very carefully, but couldn't find.

opened by jh-jeong 3
image classification

I take the liberty to bother you. I want to ask you a question about image classification, but images are not images in the usual sense. The experiment is to detect different objects with the collected WiFi signals. The above figure shows the rssi and phase change curves of the WiFi signal when a bottle is placed indoors. Now I want to classify objects according to these curves. Currently, Resnet50 DenseNet201 is used but the accuracy is not high and only 80。So I would like to ask what kind of network structure is better to use deep learning to classify this kind of image?

opened by ewwll 0
Receptive field of DenseNet

Hi,

What's the receptive field of densenet121 and densenet161? The Distill blog post doesn't mention it. Is it more than that of resnet101 (RF: 1027)?

opened by kHarshit 0
Question on the last transition layer

Hi! Thank you for publishing this. I'm following pytorch implementation of DenseNet, specifically I'm using densenet161 for extracting features from images. I'm wondering, in your implementation here, why after the last Denseblock are you adding additional transition layer, consisting of batch-normalization and ReLU? I don't see any notion of those operation in the paper. Am I missing something? I'm asking because I'm wondering how those transitions are influencing quality of learnt features when we use DenseNet not for classifying images but rather as feature extractor. Thanks!

opened by zlenyk 0
Question on impede information flow

Hi, Thanks for your great work In section 3 in paper "However, the identity function and the output of H_l are combined by summation, which may impede the information flow" I did not understand that, there is also similiar question on stackoverflow https://stackoverflow.com/questions/52696110/resnet-how-could-summation-impede-the-information-flow Looking forward to your reply

opened by Sirius083 0
Question on channel before entering the first block

Hello, I want to reproduce the results on densenet-cifar10/cifar100, but got lower accuracy on tensorflow implementation. There is one question on model architecture, In the paper Implementation Details part: "Before entering the first dense block, a convolutional with 16(or twice the growth eate for DenseNet-BC) output channels is performed on the input images." However the code seemingly all use twice the growth rate on first block. https://github.com/liuzhuang13/DenseNet/blob/master/models/densenet.lua#L15 https://github.com/liuzhuang13/DenseNet/blob/master/models/densenet.lua#L70 Since I did not get the same accuracy level with pytorch implementaion (25.53% on cifar100 d_40_k_12_no_bottleneck, 24.42% in paper. following the same data augmentation as your official code), I am wondering whether this caused the difference? (since I am a beginner on pytorch, maybe it is in other part of the code, can you point it out) Thanks in advance

opened by Sirius083 2

Owner

Zhuang Liu

GitHub

PyTorch implementation of Densely Connected Time Delay Neural Network

Densely Connected Time Delay Neural Network PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Conne

64 Oct 11, 2022

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

291 Nov 18, 2022

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Attention Transfer PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Tran

1.4k Dec 23, 2022

Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

217 Dec 12, 2022

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

59 Dec 9, 2022

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

2.1k Jan 1, 2023

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

FaderNetworks PyTorch implementation of Fader Networks (NIPS 2017). Fader Networks can generate different realistic versions of images by modifying at

753 Dec 23, 2022

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

10 Oct 11, 2022

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

3.4k Jan 2, 2023

PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

475 Dec 24, 2022

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

105 Nov 25, 2022

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Semantic Image Synthesis via Adversarial Learning This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning. Req

146 Nov 25, 2022

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

GANs for Biological Image Synthesis This codes implements the ICCV-2017 paper "GANs for Biological Image Synthesis". The paper and its supplementary m

95 Nov 25, 2022

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

541 Nov 27, 2022

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

105 Nov 25, 2022

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

Related tags

Overview

Densely Connected Convolutional Networks (DenseNets)

Citation

Other Implementations

Some Following up Projects

Contents

Introduction

Usage

DenseNet and DenseNet-BC

Memory efficient implementation (newly added feature on June 6, 2017)

Results on CIFAR

Results on ImageNet and Pretrained Models

Torch

Models in the original paper

Models in the tech report

Caffe

PyTorch

Keras, Tensorflow and Theano

MXNet

Wide-DenseNet for better Time/Accuracy and Memory/Accuracy Tradeoff

Updates

Contact

Comments

Owner

Zhuang Liu

PyTorch implementation of Densely Connected Time Delay Neural Network

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Oriented Response Networks, in CVPR 2017

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

PyTorch implementation of the Value Iteration Networks (VIN) (NIPS '16 best paper)

Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)