A PyTorch implementation of DenseNet.

Overview

A PyTorch Implementation of DenseNet

This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Convolutional Networks by G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten. This implementation gets a CIFAR-10+ error rate of 4.77 with a 100-layer DenseNet-BC with a growth rate of 12. Their official implementation and links to many other third-party implementations are available in the liuzhuang13/DenseNet repo on GitHub.

Why DenseNet?

As this table from the DenseNet paper shows, it provides competitive state of the art results on CIFAR-10, CIFAR-100, and SVHN.

Why yet another DenseNet implementation?

PyTorch is a great new framework and it's nice to have these kinds of re-implementations around so that they can be integrated with other PyTorch projects.

How do you know this implementation is correct?

Interestingly while implementing this, I had a lot of trouble getting it to converge and looked at every part of the code closer than I usually would. I compared all of the model's hidden states and gradients with the official implementation to make sure my code was correct and even trained a VGG-style network on CIFAR-10 with the training code here. It turns out that I uncovered a new critical PyTorch bug (now fixed) that was causing this.

I have left around my original message about how this isn't working and the things that I have checked in this document. I think this should be interesting for other people to see my development and debugging strategies when having issues implementing a model that's known to converge. I also started this PyTorch forum thread, which has a few other discussion points. You may also be interested in my script that compares PyTorch gradients to Torch gradients and my script that numerically checks PyTorch gradients.

My convergence issues were due to a critical PyTorch bug related to using torch.cat with convolutions with cuDNN enabled (which it is by default when CUDA is used). This bug caused incorrect gradients and the fix to this bug is to disable cuDNN (which doesn't have to be done anymore because it's fixed). The oversight in my debugging strategies that caused me to not find this error is that I did not think to disable cuDNN. Until now, I have assumed that the cuDNN option in frameworks are bug-free, but have learned that this is not always the case. I may have also found something if I would have numerically debugged torch.cat layers with convolutions instead of fully connected layers.

Adam fixed the PyTorch bug that caused this in this PR and has been merged into Torch's master branch. If you are interested in using the DenseNet code in this repository, make sure your PyTorch version contains this PR and was downloaded after 2017-02-10.

What does the PyTorch compute graph of the model look like?

You can see the compute graph here, which I created with make_graph.py, which I copied from Adam Paszke's gist. Adam says PyTorch will soon have a better way to create compute graphs.

How does this implementation perform?

By default, this repo trains a 100-layer DenseNet-BC with an growth rate of 12 on the CIFAR-10 dataset with data augmentations. Due to GPU memory sizes, this is the largest model I am able to run. The paper reports a final test error of 4.51 with this architecture and we obtain a final test error of 4.77.

Why don't people use ADAM instead of SGD for training ResNet-style models?

I also tried training a net with ADAM and found that it didn't converge as well with the default hyper-parameters compared to SGD with a reasonable learning rate schedule.

What about the non-BC version?

I haven't tested this as thoroughly, you should make sure it's working as expected if you plan to use and modify it. Let me know if you find anything wrong with it.

A paradigm for ML code

I like to include a few features in my projects that I don't see in some other re-implementations that are present in this repo. The training code in train.py uses argparse so the batch size and some other hyper-params can easily be changed and as the model is training, progress is written out to csv files in a work directory also defined by the arguments. Then a separate script plot.py plots the progress written out by the training script. The training script calls plot.py after every epoch, but it can importantly be run on its own so figures can be tweaked without re-running the entire experiment.

Help wanted: Improving memory utilization and multi-GPU support

I think there are ways to improve the memory utilization in this code as in the the official space-efficient Torch implementation. I also would be interested in multi-GPU support.

Running the code and viewing convergence

First install PyTorch (ideally in an anaconda3 distribution). ./train.py will create a model, start training it, and save progress to args.save, which is work/cifar10.base by default. The training script will call plot.py after every epoch to create plots from the saved progress.

Citations

The following is a BibTeX entry for the DenseNet paper that you should cite if you use this model.

@article{Huang2016Densely,
  author = {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.},
  title = {Densely Connected Convolutional Networks},
  journal = {arXiv preprint arXiv:1608.06993},
  year = {2016}
}

If you use this implementation, please also consider citing this implementation and code repository with the following BibTeX or plaintext entry. The BibTeX entry requires the url LaTeX package.

@misc{amos2017densenet,
  title = {{A PyTorch Implementation of DenseNet}},
  author = {Amos, Brandon and Kolter, J. Zico},
  howpublished = {\url{https://github.com/bamos/densenet.pytorch}},
  note = {Accessed: [Insert date here]}
}

Brandon Amos, J. Zico Kolter
A PyTorch Implementation of DenseNet
https://github.com/bamos/densenet.pytorch.
Accessed: [Insert date here]

Licensing

This repository is Apache-licensed.

Comments
  • How do you convert the target from numpy arrays to Tensor?

    How do you convert the target from numpy arrays to Tensor?

    opened by acgtyrant 6
  • Add bug solution/fix to bug discussion page

    Add bug solution/fix to bug discussion page

    Hey,

    I think it would be good to include how your CIFAR-10 convergence problem was solved. At the moment the discussion page just includes problem details.

    Good to hear you got it working.

    opened by thundergolfer 3
  • How can it be ran on Cifar-100?

    How can it be ran on Cifar-100?

    I changed the dataset class name in the code from CIFAR10 to CIFAR100 but got serveral error during loss.backward(), like CUDNN_STATUS_MAPPING_ERROR or cublas runtime error the gpu program failed to execute. So I guess there must be something specified to CIFAR10 in this code. But I can't find it.

    opened by wishforgood 2
  • Is the DenseBlock Implementation correct?

    Is the DenseBlock Implementation correct?

    Looking at your DenseBlock implementation, I don't see how the activations of the earlier layers before the previous one are being propagated to the later layers. Is the implementation really the same as in the DenseNet paper?

    opened by anjishnu 2
  • Question - What is the purpose of this piece of Code in densenet.py?

    Question - What is the purpose of this piece of Code in densenet.py?

    Learning CNN architectures. Can you please tell what this piece of code does and why we are doing this? I could not relate it with the paper.

    for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.bias.data.zero_()

    opened by gokulanv 1
  • Why is there a PID in device 0 while I set called all cuda(1)?

    Why is there a PID in device 0 while I set called all cuda(1)?

    See 2017-05-17-160814_1914x1005_scrot

    I have changed the train.py, you can find that I call cuda(1) at all. But why is there the same PID in the device 0??? Am I missing something?

    #!/usr/bin/env python3
    
    import argparse
    import os
    import setproctitle
    import shutil
    
    import densenet
    import torch
    from torch import optim
    from torch.autograd import Variable
    from torch.nn import functional as F
    from torch.utils.data import DataLoader
    import torchvision
    from torchvision import transforms
    
    
    def main():
        parser = argparse.ArgumentParser()
        parser.add_argument('--batchSz', type=int, default=64)
        parser.add_argument('--nEpochs', type=int, default=300)
        parser.add_argument('--no-cuda', action='store_false')
        parser.add_argument('--save')
        parser.add_argument('--seed', type=int, default=1)
        parser.add_argument(
                '--opt', type=str, default='sgd',
                choices=('sgd', 'adam', 'rmsprop'))
        args = parser.parse_args()
    
        args.cuda = args.no_cuda and torch.cuda.is_available()
        if args.cuda:
            torch.cuda.manual_seed(args.seed)
    
        args.save = args.save or 'work/densenet.base'
        setproctitle.setproctitle(args.save)
        if os.path.exists(args.save):
            shutil.rmtree(args.save)
        os.makedirs(args.save, exist_ok=True)
    
        torch.manual_seed(args.seed)
    
        normMean = [0.49139968, 0.48215827, 0.44653124]
        normStd = [0.24703233, 0.24348505, 0.26158768]
        normTransform = transforms.Normalize(normMean, normStd)
        trainTransform = transforms.Compose([
                transforms.RandomCrop(32, padding=4),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                normTransform
        ])
        testTransform = transforms.Compose([
                transforms.ToTensor(),
                normTransform
        ])
    
        kwargs = {'num_workers': 1, 'pin_memory': True} if args.cuda else {}
        trainLoader = DataLoader(
                torchvision.datasets.CIFAR10(
                        root='cifar',
                        train=True,
                        download=True,
                        transform=trainTransform),
                batch_size=args.batchSz, shuffle=True, **kwargs)
        testLoader = DataLoader(
                torchvision.datasets.CIFAR10(
                        root='cifar',
                        train=False,
                        download=True,
                        transform=testTransform),
                batch_size=args.batchSz, shuffle=False, **kwargs)
    
        net = densenet.DenseNet(
                growthRate=12,
                depth=100,
                reduction=0.5,
                bottleneck=True,
                nClasses=10)
    
        print('  + Number of params: {}'.format(
                sum([p.data.nelement() for p in net.parameters()])))
        if args.cuda:
            net = net.cuda(1)
    
        if args.opt == 'sgd':
            optimizer = optim.SGD(
                    net.parameters(), lr=1e-1, momentum=0.9, weight_decay=1e-4)
        elif args.opt == 'adam':
            optimizer = optim.Adam(net.parameters(), weight_decay=1e-4)
        elif args.opt == 'rmsprop':
            optimizer = optim.RMSprop(net.parameters(), weight_decay=1e-4)
    
        trainF = open(os.path.join(args.save, 'train.csv'), 'w')
        testF = open(os.path.join(args.save, 'test.csv'), 'w')
    
        for epoch in range(1, args.nEpochs + 1):
            adjust_opt(args.opt, optimizer, epoch)
            train(args, epoch, net, trainLoader, optimizer, trainF)
            test(args, epoch, net, testLoader, optimizer, testF)
            torch.save(net, os.path.join(args.save, 'latest.pth'))
            os.system('./plot.py {} &'.format(args.save))
    
        trainF.close()
        testF.close()
    
    
    def train(args, epoch, net, trainLoader, optimizer, trainF):
        net.train()
        nProcessed = 0
        nTrain = len(trainLoader.dataset)
        for batch_idx, (data, target) in enumerate(trainLoader):
            if args.cuda:
                data, target = data.cuda(1), target.cuda(1)
            data, target = Variable(data), Variable(target)
            optimizer.zero_grad()
            output = net(data)
            loss = F.nll_loss(output, target)
            # make_graph.save('/tmp/t.dot', loss.creator); assert(False)
            loss.backward()
            optimizer.step()
            nProcessed += len(data)
            pred = output.data.max(1)[1]
            # get the index of the max log-probability
            incorrect = pred.ne(target.data).cpu().sum()
            err = 100.0 * incorrect / len(data)
            partialEpoch = epoch + batch_idx / len(trainLoader) - 1
            print(
                    'Train Epoch: {:.2f} [{}/{}\t'
                    '({:.0f}%)]\n'
                    'Loss: {:.6f}\t' 'Error: {:.6f}'.format(
                            partialEpoch, nProcessed, nTrain,
                            100. * batch_idx / len(trainLoader),
                            loss.data[0], err))
            trainF.write('{},{},{}\n'.format(partialEpoch, loss.data[0], err))
            trainF.flush()
    
    
    def test(args, epoch, net, testLoader, optimizer, testF):
        net.eval()
        test_loss = 0
        incorrect = 0
        for data, target in testLoader:
            if args.cuda:
                data, target = data.cuda(1), target.cuda(1)
            data, target = Variable(data, volatile=True), Variable(target)
            output = net(data)
            test_loss += F.nll_loss(output, target).data[0]
            pred = output.data.max(1)[1]
            # get the index of the max log-probability
            incorrect += pred.ne(target.data).cpu().sum()
        test_loss = test_loss
        test_loss /= len(testLoader)
        # loss function already averages over batch size
        nTotal = len(testLoader.dataset)
        err = 100.0 * incorrect / nTotal
        print()
        print(
                'Test set: Average loss: {:.4f}\n'
                'Error: {}/{} ({:.0f}%)\n'.format(
                        test_loss,
                        incorrect, nTotal, err))
    
        testF.write('{},{},{}\n'.format(epoch, test_loss, err))
        testF.flush()
    
    
    def adjust_opt(optAlg, optimizer, epoch):
        if optAlg == 'sgd':
            if epoch < 150:
                lr = 1e-1
            elif epoch == 150:
                lr = 1e-2
            elif epoch == 225:
                lr = 1e-3
            else:
                return
    
            for param_group in optimizer.param_groups:
                param_group['lr'] = lr
    
    if __name__ == '__main__':
        main()
    
    opened by acgtyrant 1
  • How did you create the header.png?

    How did you create the header.png?

    I'm quite impressed with how you've presented your densenet implementation.

    V-Net as a bit messier in terms of needing substantial preprocessing of the data set, a custom loader, and a custom loss function. Nonetheless, I'm patterning the presentation of my implementation https://github.com/mattmacy/vnet.pytorch after yours and I'm wondering how you created the header.png image.

    Thanks in advance..

    opened by mattmacy 1
  • DataParallel (multi-gpu) support

    DataParallel (multi-gpu) support

    This re-factoring may well be too intrusive for your taste. I wanted to quickly test DataParallel support on a network that I knew worked well before applying it to my own work. I hope it's useful to you.

    • re-factor the DenseNet constructor so it can be passed to nn.parallel.data_parallel
    • scale batch size with the number of gpus
    • make plot.py robust to larger batch sizes
    • move weight initialization to train.py and add optional kaiming fan-in initialization
    opened by mattmacy 0
  • There is a size mismatch due to this lines

    There is a size mismatch due to this lines

    https://github.com/bamos/densenet.pytorch/blob/d1cd5e1957975628286e516512c6d1c14430f810/densenet.py#L113-L114

    I solve the issue by changing the lines of codes to

    out = self.dense3(out) out = self.relu(self.bn1(out)) out = F.avg_pool2d(out, 8) out = out.view(-1, self.nChannels)

    where self.relu has been initialised as self.relu = nn.ReLU(inplace=True)

    opened by exponentialR 0
  • Error when loading the model saved

    Error when loading the model saved

    Hi,

    I modified your code to train a model with my own dataset, and I am trying to load the model saved as "latest.pth" to do some tests. However, I am getting this error:

    AttributeError: 'DenseNet' object has no attribute 'copy'

    The code I use to load the model is:

    net.load_state_dict(torch.load(checkpoint, map_location=lambda storage, loc: storage))

    where checkpoint is the path to "latest.pth"

    Any help would be appreciated.

    Thanks

    opened by ParusMajor60 1
  • RuntimeError: size mismatch, m1: [2394 x 7], m2: [16758 x 2] at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/TH/generic/THTensorMath.cpp:2070

    RuntimeError: size mismatch, m1: [2394 x 7], m2: [16758 x 2] at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/TH/generic/THTensorMath.cpp:2070

    when I fine-tuning my dataset(96963),error :

    RuntimeError: size mismatch, m1: [2394 x 7], m2: [16758 x 2] at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/TH/generic/THTensorMath.cpp:2070

    opened by HuangQinJian 0
  • Cat vs Dog

    Cat vs Dog

    I have slightly modified your algorithm and have adapted it for two classes (k = 12, reduction = 0.5, bottleneck = True). When I train it on Cat and Dog images from CIFAR-10, I only go as high as 82% validation accuracy. Is that what you get as well? Or you get something closer to accuracy for all 10 classes, i.e. > %95?

    opened by Miladiouss 0
  • nChannels not according to paper

    nChannels not according to paper

    Hi @bamos,

    According to the paper, the number of channels is twice the grow_rate only for the BC type DenseNets. Since you are giving room to not have bottleneck, we should check that before giving nChannels. If not BC, channels = 16.

    I also included one doubt on the forward method and a number of parameters sanity check that does not match with the BC reported by the authors.

    Regards, Pablo

    opened by PabloRR100 0
  • Help needed on reproducing the performance on Cifar-100

    Help needed on reproducing the performance on Cifar-100

    I used the default setting(which I think is Densenet-12-BC with data augmentation) on cifar-100(via just changing the name of datasetclass and the nClassesvariable). The training curve looks like this: image Though the training has not ended yet, from training curves for other networks on Cifar-100 I can tell there would be no more major changes in acc. The highest of acc for now is 75.59%, which can only match the reported performance of Densenet-12(depth 40) with data augmentation. Has any one tested this repo on Cifar-100 yet?

    opened by wishforgood 3
Owner
Brandon Amos
Brandon Amos
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

null 349 Dec 8, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Fang Zhonghao 13 Nov 19, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 4, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 5, 2023
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021