3.8% and 18.3% on CIFAR-10 and CIFAR-100

Overview

Wide Residual Networks

This code was used for experiments with Wide Residual Networks (BMVC 2016) http://arxiv.org/abs/1605.07146 by Sergey Zagoruyko and Nikos Komodakis.

Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train.

To tackle these problems, in this work we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts.

For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks. We further show that WRNs achieve incredibly good results (e.g., achieving new state-of-the-art results on CIFAR-10, CIFAR-100, SVHN, COCO and substantial improvements on ImageNet) and train several times faster than pre-activation ResNets.

Update (August 2019): Pretrained ImageNet WRN models are available in torchvision 0.4 and PyTorch Hub, e.g. loading WRN-50-2:

model = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)

Update (November 2016): We updated the paper with ImageNet, COCO and meanstd preprocessing CIFAR results. If you're comparing your method against WRN, please report correct preprocessing numbers because they give substantially different results.

tldr; ImageNet WRN-50-2-bottleneck (ResNet-50 with wider inner bottleneck 3x3 convolution) is significantly faster than ResNet-152 and has better accuracy; on CIFAR meanstd preprocessing (as in fb.resnet.torch) gives better results than ZCA whitening; on COCO wide ResNet with 34 layers outperforms even Inception-v4-based Fast-RCNN model in single model performance.

Test error (%, flip/translation augmentation, meanstd normalization, median of 5 runs) on CIFAR:

Network CIFAR-10 CIFAR-100
pre-ResNet-164 5.46 24.33
pre-ResNet-1001 4.92 22.71
WRN-28-10 4.00 19.25
WRN-28-10-dropout 3.89 18.85

Single-time runs (meanstd normalization):

Dataset network test perf.
CIFAR-10 WRN-40-10-dropout 3.8%
CIFAR-100 WRN-40-10-dropout 18.3%
SVHN WRN-16-8-dropout 1.54%
ImageNet (single crop) WRN-50-2-bottleneck 21.9% top-1, 5.79% top-5
COCO-val5k (single model) WRN-34-2 36 mAP

See http://arxiv.org/abs/1605.07146 for details.

bibtex:

@INPROCEEDINGS{Zagoruyko2016WRN,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {Wide Residual Networks},
    booktitle = {BMVC},
    year = {2016}}

Pretrained models

ImageNet

WRN-50-2-bottleneck (wider bottleneck), see pretrained for details
Download (263MB): https://yadi.sk/d/-8AWymOPyVZns

There are also PyTorch and Tensorflow model definitions with pretrained weights at https://github.com/szagoruyko/functional-zoo/blob/master/wide-resnet-50-2-export.ipynb

COCO

Coming

Installation

The code depends on Torch http://torch.ch. Follow instructions here and run:

luarocks install torchnet
luarocks install optnet
luarocks install iterm

For visualizing training curves we used ipython notebook with pandas and bokeh.

Usage

Dataset support

The code supports loading simple datasets in torch format. We provide the following:

To whiten CIFAR-10 and CIFAR-100 we used the following scripts https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/datasets/make_cifar10_gcn_whitened.py and then converted to torch using https://gist.github.com/szagoruyko/ad2977e4b8dceb64c68ea07f6abf397b and npy to torch converter https://github.com/htwaijry/npy4th.

We are running ImageNet experiments and will update the paper and this repo soon.

Training

We provide several scripts for reproducing results in the paper. Below are several examples.

model=wide-resnet widen_factor=4 depth=40 ./scripts/train_cifar.sh

This will train WRN-40-4 on CIFAR-10 whitened (supposed to be in datasets folder). This network achieves about the same accuracy as ResNet-1001 and trains in 6 hours on a single Titan X. Log is saved to logs/wide-resnet_$RANDOM$RANDOM folder with json entries for each epoch and can be visualized with itorch/ipython later.

For reference we provide logs for this experiment and ipython notebook to visualize the results. After running it you should see these training curves:

viz

Another example:

model=wide-resnet widen_factor=10 depth=28 dropout=0.3 dataset=./datasets/cifar100_whitened.t7 ./scripts/train_cifar.sh

This network achieves 20.0% error on CIFAR-100 in about a day on a single Titan X.

Multi-GPU is supported with nGPU=n parameter.

Other models

Additional models in this repo:

Implementation details

The code evolved from https://github.com/szagoruyko/cifar.torch. To reduce memory usage we use @fmassa's optimize-net, which automatically shares output and gradient tensors between modules. This keeps memory usage below 4 Gb even for our best networks. Also, it can generate network graph plots as the one for WRN-16-2 in the end of this page.

Acknowledgements

We thank startup company VisionLabs and Eugenio Culurciello for giving us access to their clusters, without them ImageNet experiments wouldn't be possible. We also thank Adam Lerer and Sam Gross for helpful discussions. Work supported by EC project FP7-ICT-611145 ROBOSPECT.

Comments
  • Why do we have a seed = 444?

    Why do we have a seed = 444?

    Please,

    Can someone tells me why do we have a seed = 444 after the last commit?

    I don't believe this is making the weight initialization deterministic. Otherwise, we should always get the same result for a given experiment, and this (thank god) is not happening since I believe this (make the weight initialization deterministic) should NOT be done.

    Thanks in advance for any help.

    David

    opened by dlmacedo 12
  • COCO/ImageNet Implementation

    COCO/ImageNet Implementation

    In your wide-resnet.lua, line 95: "--one conv at the beginning (spatial size: 32x32)" I can see your model has input size of 32x32 (Cifar10/100 size) and later on performs an avg_pooling on an 8x8 input. How do you train your models on larger input datasets such as COCO and ImageNet? Do you just modify the avg pooling layer and the first convolutional layer?

    opened by Matheusih 8
  • "resume" option in pytorch code

    Thank your for your great code!

    I tried your pytorch code by following command. It is only 10 epochs because my PC is slow. python main.py --save ./logs/resnet_02c --depth 10 --width 10 --epochs 10

    After that, I tried "resume" option by following command. python main.py --save ./logs/resnet_02c --depth 10 --width 10 --epochs 12 --resume logs/resnet_02c/model.pt7

    Unfortunately, it became following error. Is "resume" option working in your environment?

    ...
    Total number of parameters: 7435354
      0%|                                                                  | 0/391 [00:00<?, ?it/s]Traceback (most recent call last):
      File "main.py", line 216, in <module>
        main()
      File "main.py", line 212, in main
        engine.train(h, train_loader, opt.epochs, optimizer)
      File "/home/itsukara/anaconda2/envs/pytorch3/lib/python3.5/site-packages/torchnet/engine/engine.py", line 39, in train
        state['optimizer'].step(closure)
      File "/home/itsukara/anaconda2/envs/pytorch3/lib/python3.5/site-packages/torch/optim/sgd.py", line 94, in step
        buf.mul_(momentum).add_(1 - dampening, d_p)
    RuntimeError: invalid argument 3: sizes do not match at /opt/conda/conda-bld/pytorch_1513366702650/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271
    
    opened by Itsukara 7
  • Training stops randomly

    Training stops randomly

    I have absolute 0 ideas why, but when I run the script, it usually trains for some random small amount of epochs (wasn't yet able to make it past 9) and then the training just stops completely randomly without any warning, any interaction on my part, or anything else. While I usually can get out of training with CTRL+C, when it gets stuck, I have to actually press CTRL+C twice to get out, that my give a clue.

    I'm using CuDNN v5.1 on Ubuntu 16.04. And the scripts I wrote with Torch don't ever get stuck like that. I thought, maybe, it's doe to some failures of kernel, so I've updated to kernel 4.8, but I still get the stops.

    opened by ibmua 7
  • Pytorch, results reporoduction

    Pytorch, results reporoduction

    Hi,

    Thanks a lot for your work and providing the Pytorch implementation. I ran two WRN-40-10-dropout on CIFAR-100 using the Pytorch code you provided, yet I couldn't reach an error rate smaller than 19%.

    Any ideas?

    Thanks

    opened by abduallahmohamed 6
  • Wide Residual Networks on imagenet

    Wide Residual Networks on imagenet

    Hi,

    Is there a version that works on imagenet? I tried using the pytorch model definition, but I think it only works if input images are 32x32. Thank you!

    opened by antspy 6
  • Fix initialization to use MSRA init

    Fix initialization to use MSRA init

    I believe the stddev of the initialized weights is a factor of sqrt(2) too high in the current Pytorch implementation.

    For comparison, the current Lua Torch implementation uses sqrt(2) rather than 2 in the numerator, which is what I would expect from He/MSRA initialization, since Var = 2 / fan_in. https://github.com/szagoruyko/wide-residual-networks/blob/master/models/utils.lua#L6

    Does this seem right to you? I unfortunately don't have the time or resources to re-run the code right now.

    opened by juesato 3
  • optnet issue

    optnet issue

    Hi,

    I could run train.lua normally. However, After rebooting my PC, when I tried to run train.lua, I faced this problem.

    /home/youngwan/torch/install/bin/luajit: /home/youngwan/torch/install/share/lua/5.1/trepl/init.lua:384: module 'optnet' not found:No LuaRocks module found for optnet no field package.preload['optnet'] no file '/home/youngwan/.luarocks/share/lua/5.1/optnet.lua' no file '/home/youngwan/.luarocks/share/lua/5.1/optnet/init.lua' no file '/home/youngwan/torch/install/share/lua/5.1/optnet.lua' no file '/home/youngwan/torch/install/share/lua/5.1/optnet/init.lua' no file './optnet.lua' no file '/home/youngwan/torch/install/share/luajit-2.1.0-beta1/optnet.lua' no file '/usr/local/share/lua/5.1/optnet.lua' no file '/usr/local/share/lua/5.1/optnet/init.lua' no file '/home/youngwan/.luarocks/lib/lua/5.1/optnet.so' no file '/home/youngwan/torch/install/lib/lua/5.1/optnet.so' no file '/home/youngwan/torch/install/lib/optnet.so' no file './optnet.so' no file '/usr/local/lib/lua/5.1/optnet.so' no file '/usr/local/lib/lua/5.1/loadall.so' stack traceback: [C]: in function 'error' /home/youngwan/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' train.lua:14: in main chunk [C]: in function 'dofile' ...gwan/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x004065d0

    checking opnet.so files in /torch/install/lib... , There are no any optnet libs. so, I tried to 'luarocks install optnet ' again.

    -- The C compiler identification is GNU 4.8.5 -- The CXX compiler identification is GNU 4.8.5 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Found Torch7 in /usr -- Configuring done -- Generating done -- Build files have been written to: /tmp/luarocks_optnet-scm-1-7703/optimize-net/build cd build && make install Install the project... -- Install configuration: "Release" -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/utils.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/example.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/models.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/init.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/graphgen.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/env.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/countUsedMemory.lua -- Installing: /usr/lib/luarocks/rocks/optnet/scm-1/lua/optnet/tests.lua Updating manifest for /usr/lib/luarocks/rocks

    Terminal display like this. But the problem occur again.

    opened by youngwanLEE 3
  • Training without data augmentation

    Training without data augmentation

    HI, szagoruyko!

    Excellent code for learning practice with torch! And I want to do some experiment without data augmentation. What should I do? Just set opt.hflip = false and opt.randomcrop = 0?

    Thank you!

    opened by huangzehao 3
  • Why the depth is 6n+4, rather than 6n+2?

    Why the depth is 6n+4, rather than 6n+2?

    Thanks for your great work!

    I wonder why the depth of network is 6n+4 for WRN for cifar. I could understand that 6n means there are n blocks and 6 layers per block. Besides, there is an extra convolutional layer before all blocks and a linear layer after all blocks. So I think the depth should be 6n+2, which is same as original Resnet.

    Hope for your reply!

    opened by cheerss 2
  • Why only 16 output channels in the first convolution?

    Why only 16 output channels in the first convolution?

    Hi,

    Is it possible I get a better accuracy in a wide resnet by using more output channels in the first convolution layer? Like using 64 or 128 as the other convolutions are getting.

    thanks

    opened by ghost 2
  • Error

    Error

    I ran with default parameters, and I'm getting a Can't pickle error. Is this perhaps a version problem?

    Number of model parameters: 36479194 Traceback (most recent call last): File "train.py", line 283, in main() File "train.py", line 136, in main train(train_loader, model, criterion, optimizer, scheduler, epoch) File "train.py", line 161, in train for i, (input, target) in enumerate(train_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init w.start() File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'main..' 170500096it [00:18, 9101116.56it/s]

    opened by tdietterich 0
  • The difference between the original version and the latest version

    The difference between the original version and the latest version

    Hi, Recently, i read the articles about wide residual networks. I found that the latest version top1err is slightly better than the first version. 4.00% vs 4.17%, 4.27% vs 4.81%,4.53% vs 4.97%. I can't find the reason in the articles.Could you tell me the why? @szagoruyko

    opened by Itsanewday 3
  • Why is the number of blocks per group different in Imagenet?

    Why is the number of blocks per group different in Imagenet?

    In contrast to cifar code, for imagenet the number of blocks in groups is not calculated automatically, but provided manually, where the third group gets the most residual blocks as shown in the code snippet below. What is the reason behind this distribution of blocks? Have you also tried to create the same number of blocks per group as done for CIFAR datasets?

    local cfg = {
             [18]  = {{2, 2, 2, 2}, 512*width, basicblock}, -- lea  as is
             [34]  = {{3, 4, 6, 3}, 512*width, basicblock}, -- leave as is
             [50]  = {{3, 4, 6, 3}, 512*bottle, bottleneck},
             [101] = {{3, 4, 23, 3}, 512*bottle, bottleneck},
             [152] = {{3, 8, 36, 3}, 512*bottle, bottleneck},
          }
    
    opened by Rahim16 0
  • How to set the Dropout rate?

    How to set the Dropout rate?

    Hi, szagoruyko, in your paper, I know dropout is useful for ResNet, but you don't mention how to set up the dropout rate. Can you tell me the principles of setting, or trying different rate, like 0.2, 0.3,...,0.5? Thank you very much!

    opened by anyujh 0
CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Frederick Wang 3 Apr 26, 2022
Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

Differential Privacy (DP) Based Federated Learning (FL) Everything about DP-based FL you need is here. (所有你需要的DP-based FL的信息都在这里) Code Tip: the code o

wenzhu 83 Dec 24, 2022
Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

gaurav pathak 86 Oct 28, 2022
TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Training CIFAR-10 with TensorFlow2(TF2) TensorFlow2 Classification Model Zoo. I'm playing with TensorFlow2 on the CIFAR-10 dataset. Architectures LeNe

Chia-Hung Yuan 16 Sep 27, 2022
Training Cifar-10 Classifier Using VGG16

opevcvdl-hw3 This project uses pytorch and Qt to achieve the requirements. Version Python 3.6 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.

Kenny Cheng 3 Aug 17, 2022
Training a deep learning model on the noisy CIFAR dataset

Training-a-deep-learning-model-on-the-noisy-CIFAR-dataset This repository contai

null 1 Jun 14, 2022
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.

Facebook Research 4.6k Jan 9, 2023
A collection of 100 Deep Learning images and visualizations

A collection of Deep Learning images and visualizations. The project has been developed by the AI Summer team and currently contains almost 100 images.

AI Summer 65 Sep 12, 2022
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

Preparation Please see dataset/README.md to get more details about our datasets-VIL100 Please see INSTALL.md to install environment and evaluation too

null 82 Dec 15, 2022
torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations

??A high level pipeline for face landmarks detection, supports training, evaluating, exporting, inference and 100+ data augmentations, compatible with torchvision and albumentations, can easily install with pip.

DefTruth 142 Dec 25, 2022
GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Guidedog Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma GuideDog is an AI/ML-based mobile app designed to assist the lives of the visua

Kyuhee Jo 5 Nov 24, 2021
Erpnext app for make employee salary on payroll entry based on one or more project with percentage for all project equal 100 %

Project Payroll this app for make payroll for employee based on projects like project on 30 % and project 2 70 % as account dimension it makes genral

Ibrahim Morghim 8 Jan 2, 2023
Dark Finix: All in one hacking framework with almost 100 tools

Dark Finix - Hacking Framework. Dark Finix is a all in one hacking framework wit

Md. Nur habib 2 Feb 18, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Dec 31, 2022