ConvNet training using pytorch

Overview

Convolutional networks using PyTorch

This is a complete training example for Deep Convolutional Networks on various datasets (ImageNet, Cifar10, Cifar100, MNIST).

Available models include:

'alexnet', 'amoebanet', 'darts', 'densenet', 'googlenet', 'inception_resnet_v2', 'inception_v2', 'mnist', 'mobilenet', 'mobilenet_v2', 'nasnet', 'resnet', 'resnet_se', 'resnet_zi', 'resnet_zi_se', 'resnext', 'resnext_se'

It is based off imagenet example in pytorch with helpful additions such as:

  • Training on several datasets other than imagenet
  • Complete logging of trained experiment
  • Graph visualization of the training/validation loss and accuracy
  • Definition of preprocessing and optimization regime for each model
  • Distributed training

To clone:

git clone --recursive https://github.com/eladhoffer/convNet.pytorch

example for efficient multi-gpu training of resnet50 (4 gpus, label-smoothing):

python -m torch.distributed.launch --nproc_per_node=4  main.py --model resnet --model-config "{'depth': 50}" --eval-batch-size 512 --save resnet50_ls --label-smoothing 0.1

This code can be used to implement several recent papers:

Dependencies

Data

  • Configure your dataset path with datasets-dir argument
  • To get the ILSVRC data, you should register on their site for access: http://www.image-net.org/

Model configuration

Network model is defined by writing a .py file in models folder, and selecting it using the model flag. Model function must be registered in models/__init__.py The model function must return a trainable network. It can also specify additional training options such optimization regime (either a dictionary or a function), and input transform modifications.

e.g for a model definition:

class Model(nn.Module):

    def __init__(self, num_classes=1000):
        super(Model, self).__init__()
        self.model = nn.Sequential(...)

        self.regime = [
            {'epoch': 0, 'optimizer': 'SGD', 'lr': 1e-2,
                'weight_decay': 5e-4, 'momentum': 0.9},
            {'epoch': 15, 'lr': 1e-3, 'weight_decay': 0}
        ]

        self.data_regime = [
            {'epoch': 0, 'input_size': 128, 'batch_size': 256},
            {'epoch': 15, 'input_size': 224, 'batch_size': 64}
        ]
    def forward(self, inputs):
        return self.model(inputs)
        
 def model(**kwargs):
        return Model()

Citation

If you use the code in your paper, consider citing one of the implemented works.

@inproceedings{hoffer2018fix,
  title={Fix your classifier: the marginal value of training the last weight layer},
  author={Elad Hoffer and Itay Hubara and Daniel Soudry},
  booktitle={International Conference on Learning Representations},
  year={2018},
  url={https://openreview.net/forum?id=S1Dh8Tg0-},
}
@inproceedings{hoffer2018norm,
  title={Norm matters: efficient and accurate normalization schemes in deep networks},
  author={Hoffer, Elad and Banner, Ron and Golan, Itay and Soudry, Daniel},
  booktitle={Advances in Neural Information Processing Systems},
  year={2018}
}
@inproceedings{banner2018scalable,
  title={Scalable Methods for 8-bit Training of Neural Networks},
  author={Banner, Ron and Hubara, Itay and Hoffer, Elad and Soudry, Daniel},
  booktitle={Advances in Neural Information Processing Systems},
  year={2018}
}
@inproceedings{Hoffer_2020_CVPR,
  author = {Hoffer, Elad and Ben-Nun, Tal and Hubara, Itay and Giladi, Niv and Hoefler, Torsten and Soudry, Daniel},
  title = {Augment Your Batch: Improving Generalization Through Instance Repetition},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}
@article{hoffer2019mix,
  title={Mix \& Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency},
  author={Hoffer, Elad and Weinstein, Berry and Hubara, Itay and Ben-Nun, Tal and Hoefler, Torsten and Soudry, Daniel},
  journal={arXiv preprint arXiv:1908.08986},
  year={2019}
}
Comments
  • TypeError: __init__() got an unexpected keyword argument 'reduction'

    TypeError: __init__() got an unexpected keyword argument 'reduction'

    Hi, When I try to run the script, I face this error , whats wrong ? TypeError: __init__() got an unexpected keyword argument 'reduction' I'm calling the script like this :

    IMAGENET_DIR=ImageNet_DataSet
    MODEL_NAME=mobilenet
    python main.py --dataset $IMAGENET_DIR --model $MODEL_NAME 
    

    I'm using Python3.6 and Pytorch 0.4 Thanks alot in advance

    opened by Coderx7 6
  • RuntimeError: view size is not compatible with input tensor's size and stride

    RuntimeError: view size is not compatible with input tensor's size and stride

    I have the following config file:

    {
        "adapt_grad_norm": null,
        "autoaugment": false,
        "batch_size": 256,
        "chunk_batch": 1,
        "config_file": null,
        "cutmix": null,
        "cutout": false,
        "dataset": "imagenet",
        "datasets_dir": "~/data/",
        "device": "cuda",
        "device_ids": [
            0
        ],
        "dist_backend": "nccl",
        "dist_init": "env://",
        "distributed": false,
        "drop_optim_state": false,
        "dtype": "float",
        "duplicates": 1,
        "epochs": 90,
        "eval_batch_size": -1,
        "evaluate": null,
        "grad_clip": -1,
        "input_size": null,
        "label_smoothing": 0,
        "local_rank": -1,
        "loss_scale": 1,
        "lr": 0.1,
        "mixup": null,
        "model": "alexnet",
        "model_config": "",
        "momentum": 0.9,
        "optimizer": "SGD",
        "print_freq": 10,
        "results_dir": "./results",
        "resume": "",
        "save": "alexnet_unquant",
        "save_all": false,
        "seed": 123,
        "start_epoch": -1,
        "sync_bn": false,
        "tensorwatch": false,
        "tensorwatch_port": 0,
        "weight_decay": 0,
        "workers": 8,
        "world_size": -1
    }
    

    I get the following error:

    Starting Epoch: 1
    
    Traceback (most recent call last):
      File "main.py", line 364, in <module>
        main()
      File "main.py", line 130, in main
        main_worker(args)
      File "main.py", line 306, in main_worker
        train_results = trainer.train(train_data.get_loader(),
      File "/MyPath/convNet.pytorch/trainer.py", line 269, in train
        return self.forward(data_loader, training=True, average_output=average_output, chunk_batch=chunk_batch)
      File "/MyPath/convNet.pytorch/trainer.py", line 224, in forward
        prec1, prec5 = accuracy(output, target, topk=(1, 5))
      File "/MyPath/convNet.pytorch/utils/meters.py", line 70, in accuracy
        correct_k = correct[:k].view(-1).float().sum(0)
    RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
    

    I get the same error when I try the following command in README.rd:

    python main.py --model resnet --model-config "{'depth': 18, 'quantize':True}" --save resnet18_8bit -b 64
    

    How to rectify this?

    Thanks.

    opened by gakadam 1
  • Hi, is any explanation about the scale_fix in the RangeBN

    Hi, is any explanation about the scale_fix in the RangeBN

    A fix is added in the RangeBN for the scale: scale_fix = (0.5 * 0.35) * (1 + (math.pi * math.log(4)) ** 0.5) / ((2 * math.log(y.size(-1))) ** 0.5)

    (2lng(n)) ** 0.5 is explained in the paper. However where do the 0.5 / 0.35/ pie*ln(4) come from?

    opened by blueardour 1
  • AttributeError: 'Image' object has no attribute 'new'

    AttributeError: 'Image' object has no attribute 'new'

    Hi, Why am I facing this error ? its complaining about this line: alpha = img.new().resize_(3).normal_(0, self.alphastd)

    Whats wrong here? I'm using Pytorch 0.4

    opened by Coderx7 1
  • May I know the software versions

    May I know the software versions

    Hi, @eladhoffer

    Thanks for the excellent project. I'm trying the code on my local machine. However, some python package dependence seems not met. I wonder if you could share the version of some key software, such as the pytorch, python, os?

    opened by blueardour 0
  • ResNext CIFAR10 crashing during layer construction

    ResNext CIFAR10 crashing during layer construction

    When running the command : python3.6 main.py -b 32 --gpus 1 --model resnext --dataset cifar10

    An error pops up

    Traceback (most recent call last):
      File "main.py", line 306, in <module>
        main()
      File "main.py", line 194, in main
        train_loader, model, criterion, epoch, optimizer)
      File "main.py", line 295, in train
        training=True, optimizer=optimizer)
      File "main.py", line 254, in forward
        output=model(input_var)
      File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 158, in forward
        x = self.layer1(x)
      File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 72, in forward
        input = module(input)
      File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 56, in forward
        residual = self.downsample(x)
      File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 119, in forward
        return torch.cat([ds, self.zero.expand(*zeros_size)], 1)
    RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)
    
    
    opened by Lancer555 0
  • How to run the code in Colaboratory?

    How to run the code in Colaboratory?

    I'm trying to run main.py using colab and I executed following command in colab

    !python /content/convNet.pytorch/main.py --dataset cifar10 --model resnet --model-config "{'depth': 44}" --duplicates 40 --cutout -b 64 --epochs 100 --save resnet44_cutout_m-40

    and I'm getting the following error

    Traceback (most recent call last): File "/content/convNet.pytorch/main.py", line 14, in <module> from data import DataRegime, SampledDataRegime File "/content/convNet.pytorch/data.py", line 9, in <module> from utils.dataset import IndexedFileDataset File "/content/convNet.pytorch/utils/dataset.py", line 6, in <module> from torch.utils.data.sampler import Sampler, RandomSampler, BatchSampler, _int_classes ImportError: cannot import name '_int_classes' from 'torch.utils.data.sampler' (/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py)

    How can I fix this issue? What steps do I need to follow to be able to run the code in colab? What changes do I need to make?

    opened by abdulsam 2
  • main.py line 287 defaults = {**train_data_defaults} - SyntaxError: invalid syntax

    main.py line 287 defaults = {**train_data_defaults} - SyntaxError: invalid syntax

    Hi,

    This is the output on command line:

    python main.py --dataset cifar10 --model resnet --model-config "{'depth': 44}" --duplicates 32 --cutout -b 64 --epochs 100 --save resnet_44_cutout_m-32-new File "main.py", line 287 defaults = {**train_data_defaults} ^ SyntaxError: invalid syntax

    opened by BrandonLiang 1
  • Nan loss for quantization

    Nan loss for quantization

    I'm running the code for 8-bit quantization but found that the training loss always gets NAN while I didn't make a slight modification to the original code. Wondering why this could happen and hoping for your clarification.

    opened by syorami 4
  • Stochastic quantization: difference between code and paper

    Stochastic quantization: difference between code and paper

    In the paper, the stochastic quantization was done by rounding up with probability p=clip(0.5x, 0, 1), and rounding down with probability 1-p. However, in the code it's done by adding random uniform noise before quantization: noise = output.new(output.shape).uniform_(-0.5, 0.5) output.add_(noise) This noise does not depend on the magnitude of x. I wonder what is the reasoning behind this discrepancy?

    opened by michaelklachko 0
  • bug in vgg?

    bug in vgg?

    Hi, cool framework! note that you add a layer of AvgPool2D with kernel=1 in the class VGG. This basically doesn't have any effect. Perhaps you meant AdaptiveAveragePool? In addition, the input for the classification layer is usually 77512, given an input of 224x224.

    opened by rosenfeldamir 0
Owner
Elad Hoffer
Elad Hoffer
simple generative adversarial network (GAN) using PyTorch

Generative Adversarial Networks (GANs) in PyTorch Running Run the sample code by typing: ./gan_pytorch.py ...and you'll train two nets to battle it o

vanguard_space 32 Jun 14, 2020
PyTorch tutorials.

PyTorch Tutorials All the tutorials are now presented as sphinx style documentation at: https://pytorch.org/tutorials Contributing We use sphinx-galle

null 6.6k Jan 2, 2023
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

PyTorch Examples WARNING: if you fork this repo, github actions will run daily on it. To disable this, go to /examples/settings/actions and Disable Ac

null 19.4k Jan 1, 2023
Deep Learning (with PyTorch)

Deep Learning (with PyTorch) This notebook repository now has a companion website, where all the course material can be found in video and textual for

Alfredo Canziani 6.2k Jan 2, 2023
Open source guides/codes for mastering deep learning to deploying deep learning in production in PyTorch, Python, C++ and more.

Deep Learning Materials by Deep Learning Wizard Start Learning Now Please head to www.deeplearningwizard.com to start learning! It is mobile/tablet fr

Ritchie Ng 572 Dec 28, 2022
C++ Implementation of PyTorch Tutorials for Everyone

C++ Implementation of PyTorch Tutorials for Everyone OS (Compiler)\LibTorch 1.9.0 macOS (clang 10.0, 11.0, 12.0) Linux (gcc 8, 9, 10, 11) Windows (msv

Omkar Prabhu 1.5k Jan 4, 2023
Simple examples to introduce PyTorch

This repository introduces the fundamental concepts of PyTorch through self-contained examples. At its core, PyTorch provides two main features: An n-

Justin Johnson 4.4k Jan 7, 2023
Minimal tutorials for PyTorch

Minimal tutorials for PyTorch adapted from Alec Radford's Theano tutorials. Tensor multiplication Linear Regression Logistic Regression Neural Network

Vinh Khuc 321 Oct 25, 2022
PyTorch Tutorial for Deep Learning Researchers

This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less

Yunjey Choi 25.4k Jan 5, 2023
Simple PyTorch Tutorials Zero to ALL!

PyTorchZeroToAll Quick 3~4 day lecture materials for HKUST students. Video Lectures: (RNN TBA) Youtube Bilibili Slides Lecture Slides @GoogleDrive If

Sung Kim 3.7k Dec 30, 2022
Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)

DeepNLP-models-Pytorch Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ: NLP with Deep Learning) This is not for Pytorch be

Kim SungDong 2.9k Dec 24, 2022
PyTorch tutorials and best practices.

Effective PyTorch Table of Contents Part I: PyTorch Fundamentals PyTorch basics Encapsulate your model with Modules Broadcasting the good and the ugly

Vahid Kazemi 1.5k Jan 4, 2023
A scalable template for PyTorch projects, with examples in Image Segmentation, Object classification, GANs and Reinforcement Learning.

PyTorch Project Template is being sponsored by the following tool; please help to support us by taking a look and signing up to a free trial PyTorch P

Mo'men AbdelRazek 740 Dec 23, 2022
Some example scripts on pytorch

pytorch-practice Some example scripts on pytorch CONLL 2000 Chunking task Uses BiLSTM CRF loss with char CNN embeddings. To run use: cd data/conll2000

Shubhanshu Mishra 180 Dec 22, 2022
Example of network fine-tuning in pytorch for the kaggle competition Dogs vs. Cats Redux: Kernels Edition

Example of network fine-tuning in pytorch for the kaggle competition Dogs vs. Cats Redux: Kernels Edition Currently

bobby 70 Sep 22, 2022
Torch Containers simplified in PyTorch

pytorch-containers This repository aims to help former Torchies more seamlessly transition to the "Containerless" world of PyTorch by providing a list

Max deGroot 88 Apr 25, 2022
The Hitchiker's Guide to PyTorch

The Hitchiker's Guide to PyTorch

Kai Arulkumaran 1k Dec 20, 2022
This is a gentle introductin on how to start using an awesome library called Weights and Biases.

?? W&B Minimal PyTorch Tutorial This tutorial is also accompanied with a PyTorch source code, it can be found in src folder. Furthermore, all plots an

Nauryzbay K 8 Aug 20, 2022
A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

Raghvender 2 Feb 14, 2022
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference This repository contains PyTorch evaluation code, training code and pretrained

Facebook Research 504 Jan 2, 2023