ConvNet training using pytorch

Elad Hoffer

Last update: Dec 30, 2022

Related tags

PyTorch Learning Resources convNet.pytorch

Overview

Convolutional networks using PyTorch

This is a complete training example for Deep Convolutional Networks on various datasets (ImageNet, Cifar10, Cifar100, MNIST).

Available models include:

'alexnet', 'amoebanet', 'darts', 'densenet', 'googlenet', 'inception_resnet_v2', 'inception_v2', 'mnist', 'mobilenet', 'mobilenet_v2', 'nasnet', 'resnet', 'resnet_se', 'resnet_zi', 'resnet_zi_se', 'resnext', 'resnext_se'

It is based off imagenet example in pytorch with helpful additions such as:

Training on several datasets other than imagenet
Complete logging of trained experiment
Graph visualization of the training/validation loss and accuracy
Definition of preprocessing and optimization regime for each model
Distributed training

To clone:

git clone --recursive https://github.com/eladhoffer/convNet.pytorch

example for efficient multi-gpu training of resnet50 (4 gpus, label-smoothing):

python -m torch.distributed.launch --nproc_per_node=4  main.py --model resnet --model-config "{'depth': 50}" --eval-batch-size 512 --save resnet50_ls --label-smoothing 0.1

This code can be used to implement several recent papers:

Hoffer et al. (2018): Fix your classifier: the marginal value of training the last weight layer
Hoffer et al. (2018): Norm matters: efficient and accurate normalization schemes in deep networks

For example, training ResNet18 with L1 norm (instead of batch-norm):
```
python main.py --model resnet --model-config "{'depth': 18, 'bn_norm': 'L1'}" --save resnet18_l1 -b 128
```
Banner et al. (2018): Scalable Methods for 8-bit Training of Neural Networks

For example, training ResNet18 with 8-bit quantization:
```
python main.py --model resnet --model-config "{'depth': 18, 'quantize':True}" --save resnet18_8bit -b 64
```
Hoffer et al. (2020): Augment Your Batch: Improving Generalization Through Instance Repetition

For example, training the resnet44 + cutout example in paper:
```
python main.py --dataset cifar10 --model resnet --model-config "{'depth': 44}"  --duplicates 40 --cutout -b 64 --epochs 100 --save resnet44_cutout_m-40
```
Hoffer et al. (2019): Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency

For example, training the resnet44 with mixed sizes example in paper:
```
python main.py --model resnet --dataset cifar10 --save cifar10_mixsize_d -b 64 --model-config "{'regime': 'sampled_D+'}" --epochs 200
```
Then, calibrate for specific size and evaluate using
```
python evaluate.py ./results/cifar10_mixsize_d/checkpoint.pth.tar --dataset cifar10 -b 64 --input-size 32 --calibrate-bn
```
Pretrained models (ResNet50, ImageNet) are also available here

Dependencies

pytorch
torchvision to load the datasets, perform image transforms
pandas for logging to csv
bokeh for training visualization

Data

Configure your dataset path with datasets-dir argument
To get the ILSVRC data, you should register on their site for access: http://www.image-net.org/

Model configuration

Network model is defined by writing a .py file in models folder, and selecting it using the model flag. Model function must be registered in models/__init__.py The model function must return a trainable network. It can also specify additional training options such optimization regime (either a dictionary or a function), and input transform modifications.

e.g for a model definition:

class Model(nn.Module):

    def __init__(self, num_classes=1000):
        super(Model, self).__init__()
        self.model = nn.Sequential(...)

        self.regime = [
            {'epoch': 0, 'optimizer': 'SGD', 'lr': 1e-2,
                'weight_decay': 5e-4, 'momentum': 0.9},
            {'epoch': 15, 'lr': 1e-3, 'weight_decay': 0}
        ]

        self.data_regime = [
            {'epoch': 0, 'input_size': 128, 'batch_size': 256},
            {'epoch': 15, 'input_size': 224, 'batch_size': 64}
        ]
    def forward(self, inputs):
        return self.model(inputs)
        
 def model(**kwargs):
        return Model()

Citation

If you use the code in your paper, consider citing one of the implemented works.

@inproceedings{hoffer2018fix,
  title={Fix your classifier: the marginal value of training the last weight layer},
  author={Elad Hoffer and Itay Hubara and Daniel Soudry},
  booktitle={International Conference on Learning Representations},
  year={2018},
  url={https://openreview.net/forum?id=S1Dh8Tg0-},
}

@inproceedings{hoffer2018norm,
  title={Norm matters: efficient and accurate normalization schemes in deep networks},
  author={Hoffer, Elad and Banner, Ron and Golan, Itay and Soudry, Daniel},
  booktitle={Advances in Neural Information Processing Systems},
  year={2018}
}

@inproceedings{banner2018scalable,
  title={Scalable Methods for 8-bit Training of Neural Networks},
  author={Banner, Ron and Hubara, Itay and Hoffer, Elad and Soudry, Daniel},
  booktitle={Advances in Neural Information Processing Systems},
  year={2018}
}

@inproceedings{Hoffer_2020_CVPR,
  author = {Hoffer, Elad and Ben-Nun, Tal and Hubara, Itay and Giladi, Niv and Hoefler, Torsten and Soudry, Daniel},
  title = {Augment Your Batch: Improving Generalization Through Instance Repetition},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

@article{hoffer2019mix,
  title={Mix \& Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency},
  author={Hoffer, Elad and Weinstein, Berry and Hubara, Itay and Ben-Nun, Tal and Hoefler, Torsten and Soudry, Daniel},
  journal={arXiv preprint arXiv:1908.08986},
  year={2019}
}

Comments

TypeError: __init__() got an unexpected keyword argument 'reduction'
Hi, When I try to run the script, I face this error , whats wrong ? TypeError: __init__() got an unexpected keyword argument 'reduction' I'm calling the script like this :

IMAGENET_DIR=ImageNet_DataSet MODEL_NAME=mobilenet python main.py --dataset $IMAGENET_DIR --model $MODEL_NAME

I'm using Python3.6 and Pytorch 0.4 Thanks alot in advance
opened by Coderx7 6

RuntimeError: view size is not compatible with input tensor's size and stride

I have the following config file:

{
    "adapt_grad_norm": null,
    "autoaugment": false,
    "batch_size": 256,
    "chunk_batch": 1,
    "config_file": null,
    "cutmix": null,
    "cutout": false,
    "dataset": "imagenet",
    "datasets_dir": "~/data/",
    "device": "cuda",
    "device_ids": [
        0
    ],
    "dist_backend": "nccl",
    "dist_init": "env://",
    "distributed": false,
    "drop_optim_state": false,
    "dtype": "float",
    "duplicates": 1,
    "epochs": 90,
    "eval_batch_size": -1,
    "evaluate": null,
    "grad_clip": -1,
    "input_size": null,
    "label_smoothing": 0,
    "local_rank": -1,
    "loss_scale": 1,
    "lr": 0.1,
    "mixup": null,
    "model": "alexnet",
    "model_config": "",
    "momentum": 0.9,
    "optimizer": "SGD",
    "print_freq": 10,
    "results_dir": "./results",
    "resume": "",
    "save": "alexnet_unquant",
    "save_all": false,
    "seed": 123,
    "start_epoch": -1,
    "sync_bn": false,
    "tensorwatch": false,
    "tensorwatch_port": 0,
    "weight_decay": 0,
    "workers": 8,
    "world_size": -1
}

I get the following error:

Starting Epoch: 1

Traceback (most recent call last):
  File "main.py", line 364, in <module>
    main()
  File "main.py", line 130, in main
    main_worker(args)
  File "main.py", line 306, in main_worker
    train_results = trainer.train(train_data.get_loader(),
  File "/MyPath/convNet.pytorch/trainer.py", line 269, in train
    return self.forward(data_loader, training=True, average_output=average_output, chunk_batch=chunk_batch)
  File "/MyPath/convNet.pytorch/trainer.py", line 224, in forward
    prec1, prec5 = accuracy(output, target, topk=(1, 5))
  File "/MyPath/convNet.pytorch/utils/meters.py", line 70, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I get the same error when I try the following command in README.rd:

python main.py --model resnet --model-config "{'depth': 18, 'quantize':True}" --save resnet18_8bit -b 64

How to rectify this?

Thanks.

opened by gakadam 1

Hi, is any explanation about the scale_fix in the RangeBN

A fix is added in the RangeBN for the scale: scale_fix = (0.5 * 0.35) * (1 + (math.pi * math.log(4)) ** 0.5) / ((2 * math.log(y.size(-1))) ** 0.5)

(2lng(n)) ** 0.5 is explained in the paper. However where do the 0.5 / 0.35/ pie*ln(4) come from?

opened by blueardour 1
AttributeError: 'Image' object has no attribute 'new'

Hi, Why am I facing this error ? its complaining about this line: alpha = img.new().resize_(3).normal_(0, self.alphastd)

Whats wrong here? I'm using Pytorch 0.4

opened by Coderx7 1
May I know the software versions

Hi, @eladhoffer

Thanks for the excellent project. I'm trying the code on my local machine. However, some python package dependence seems not met. I wonder if you could share the version of some key software, such as the pytorch, python, os?

opened by blueardour 0

ResNext CIFAR10 crashing during layer construction

When running the command : python3.6 main.py -b 32 --gpus 1 --model resnext --dataset cifar10

An error pops up

Traceback (most recent call last):
  File "main.py", line 306, in <module>
    main()
  File "main.py", line 194, in main
    train_loader, model, criterion, epoch, optimizer)
  File "main.py", line 295, in train
    training=True, optimizer=optimizer)
  File "main.py", line 254, in forward
    output=model(input_var)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 158, in forward
    x = self.layer1(x)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 72, in forward
    input = module(input)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 56, in forward
    residual = self.downsample(x)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 119, in forward
    return torch.cat([ds, self.zero.expand(*zeros_size)], 1)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

opened by Lancer555 0

How to run the code in Colaboratory?

I'm trying to run main.py using colab and I executed following command in colab

!python /content/convNet.pytorch/main.py --dataset cifar10 --model resnet --model-config "{'depth': 44}" --duplicates 40 --cutout -b 64 --epochs 100 --save resnet44_cutout_m-40

and I'm getting the following error

Traceback (most recent call last): File "/content/convNet.pytorch/main.py", line 14, in <module> from data import DataRegime, SampledDataRegime File "/content/convNet.pytorch/data.py", line 9, in <module> from utils.dataset import IndexedFileDataset File "/content/convNet.pytorch/utils/dataset.py", line 6, in <module> from torch.utils.data.sampler import Sampler, RandomSampler, BatchSampler, _int_classes ImportError: cannot import name '_int_classes' from 'torch.utils.data.sampler' (/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py)

How can I fix this issue? What steps do I need to follow to be able to run the code in colab? What changes do I need to make?

opened by abdulsam 2
$main.py line 287 defaults = {**train_data_defaults} - SyntaxError: invalid syntax$

main.py line 287 defaults = {**train_data_defaults} - SyntaxError: invalid syntax

Hi,

This is the output on command line:

python main.py --dataset cifar10 --model resnet --model-config "{'depth': 44}" --duplicates 32 --cutout -b 64 --epochs 100 --save resnet_44_cutout_m-32-new File "main.py", line 287 defaults = {**train_data_defaults} ^ SyntaxError: invalid syntax

opened by BrandonLiang 1
Nan loss for quantization

I'm running the code for 8-bit quantization but found that the training loss always gets NAN while I didn't make a slight modification to the original code. Wondering why this could happen and hoping for your clarification.

opened by syorami 4
Stochastic quantization: difference between code and paper

In the paper, the stochastic quantization was done by rounding up with probability p=clip(0.5x, 0, 1), and rounding down with probability 1-p. However, in the code it's done by adding random uniform noise before quantization: noise = output.new(output.shape).uniform_(-0.5, 0.5) output.add_(noise) This noise does not depend on the magnitude of x. I wonder what is the reasoning behind this discrepancy?

opened by michaelklachko 0
bug in vgg?

Hi, cool framework! note that you add a layer of AvgPool2D with kernel=1 in the class VGG. This basically doesn't have any effect. Perhaps you meant AdaptiveAveragePool? In addition, the input for the classification layer is usually 77512, given an input of 224x224.

opened by rosenfeldamir 0

Owner

Elad Hoffer

GitHub

simple generative adversarial network (GAN) using PyTorch

Generative Adversarial Networks (GANs) in PyTorch Running Run the sample code by typing: ./gan_pytorch.py ...and you'll train two nets to battle it o

32 Jun 14, 2020

PyTorch tutorials.

PyTorch Tutorials All the tutorials are now presented as sphinx style documentation at: https://pytorch.org/tutorials Contributing We use sphinx-galle

6.6k Jan 2, 2023

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

PyTorch Examples WARNING: if you fork this repo, github actions will run daily on it. To disable this, go to /examples/settings/actions and Disable Ac

19.4k Jan 1, 2023

Deep Learning (with PyTorch)

Deep Learning (with PyTorch) This notebook repository now has a companion website, where all the course material can be found in video and textual for

6.2k Jan 2, 2023

Open source guides/codes for mastering deep learning to deploying deep learning in production in PyTorch, Python, C++ and more.

Deep Learning Materials by Deep Learning Wizard Start Learning Now Please head to www.deeplearningwizard.com to start learning! It is mobile/tablet fr

572 Dec 28, 2022

C++ Implementation of PyTorch Tutorials for Everyone

C++ Implementation of PyTorch Tutorials for Everyone OS (Compiler)\LibTorch 1.9.0 macOS (clang 10.0, 11.0, 12.0) Linux (gcc 8, 9, 10, 11) Windows (msv

1.5k Jan 4, 2023

Simple examples to introduce PyTorch

This repository introduces the fundamental concepts of PyTorch through self-contained examples. At its core, PyTorch provides two main features: An n-

4.4k Jan 7, 2023

Minimal tutorials for PyTorch

Minimal tutorials for PyTorch adapted from Alec Radford's Theano tutorials. Tensor multiplication Linear Regression Logistic Regression Neural Network

321 Oct 25, 2022

PyTorch Tutorial for Deep Learning Researchers

This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less

25.4k Jan 5, 2023

Simple PyTorch Tutorials Zero to ALL!

PyTorchZeroToAll Quick 3~4 day lecture materials for HKUST students. Video Lectures: (RNN TBA) Youtube Bilibili Slides Lecture Slides @GoogleDrive If

3.7k Dec 30, 2022

Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

DeepNLP-models-Pytorch Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning) This is not for Pytorch be

2.9k Dec 24, 2022

PyTorch tutorials and best practices.

Effective PyTorch Table of Contents Part I: PyTorch Fundamentals PyTorch basics Encapsulate your model with Modules Broadcasting the good and the ugly

1.5k Jan 4, 2023

A scalable template for PyTorch projects, with examples in Image Segmentation, Object classification, GANs and Reinforcement Learning.

PyTorch Project Template is being sponsored by the following tool; please help to support us by taking a look and signing up to a free trial PyTorch P