DeepLab resnet v2 model in pytorch

Overview

pytorch-deeplab-resnet

DeepLab resnet v2 model implementation in pytorch.

The architecture of deepLab-ResNet has been replicated exactly as it is from the caffe implementation. This architecture calculates losses on input images over multiple scales ( 1x, 0.75x, 0.5x ). Losses are calculated individually over these 3 scales. In addition to these 3 losses, one more loss is calculated after merging the output score maps on the 3 scales. These 4 losses are added to calculate the total loss.

Updates

18 July 2017

  • One more evaluation script is added, evalpyt2.py. The old evaluation script evalpyt.py uses a different methodoloy to take mean of IOUs than the one used by authors. Results section has been updated to incorporate this change.

24 June 2017

  • Now, weights over the 3 scales ( 1x, 0.75x, 0.5x ) are shared as in the caffe implementation. Previously, each of the 3 scales had seperate weights. Results are almost same after making this change (more in the results section). However, the size of the trained .pth model has reduced significantly. Memory occupied on GPU(11.9 GB) and time taken (~3.5 hours) during training are same as before. Links to corresponding .pth files have been updated.
  • Custom data can be used to train pytorch-deeplab-resnet using train.py, flag --NoLabels (total number of labels in training data) has been added to train.py and evalpyt.py for this purpose. Please note that labels should be denoted by contiguous values (starting from 0) in the ground truth images. For eg. if there are 7 (no_labels) different labels, then each ground truth image must have these labels as 0,1,2,3,...6 (no_labels-1).

The older version (prior to 24 June 2017) is available here.

Usage

Note that this repository has been tested with python 2.7 only.

Converting released caffemodel to pytorch model

To convert the caffemodel released by authors, download the deeplab-resnet caffemodel (train_iter_20000.caffemodel) pretrained on VOC into the data folder. After that, run

python convert_deeplab_resnet.py

to generate the corresponding pytorch model file (.pth). The generated .pth snapshot file can be used to get the exsct same test performace as offered by using the caffemodel in caffe (as shown by numbers in results section). If you do not want to generate the .pth file yourself, you can download it here.

To run convert_deeplab_resnet.py, deeplab v2 caffe and pytorch (python 2.7) are required.

If you want to train your model in pytorch, move to the next section.

Training

Step 1: Convert init.caffemodel to a .pth file: init.caffemodel contains MS COCO trained weights. We use these weights as initilization for all but the final layer of our model. For the last layer, we use random gaussian with a standard deviation of 0.01 as the initialization. To convert init.caffemodel to a .pth file, run (or download the converted .pth here)

python init_net_surgery.py

To run init_net_surgery .py, deeplab v2 caffe and pytorch (python 2.7) are required.

Step 2: Now that we have our initialization, we can train deeplab-resnet by running,

python train.py

To get a description of each command-line arguments, run

python train.py -h

To run train.py, pytorch (python 2.7) is required.

By default, snapshots are saved in every 1000 iterations in the data/snapshots. The following features have been implemented in this repository -

  • Training regime is the same as that of the caffe implementation - SGD with momentum is used, along with the poly lr decay policy. A weight decay has been used. The last layer has 10 times the learning rate of other layers.
  • The iter_size parameter of caffe has been implemented, effectively increasing the batch_size to batch_size times iter_size
  • Random flipping and random scaling of input has been used as data augmentation. The caffe implementation uses 4 fixed scales (0.5,0.75,1,1.25,1.5) while in the pytorch implementation, for each iteration scale is randomly picked in the range - [0.5,1.3].
  • The boundary label (255 in ground truth labels) has not been ignored in the loss function in the current version, instead it has been merged with the background. The ignore_label caffe parameter would be implemented in the future versions. Post processing using CRF has not been implemented.
  • Batchnorm parameters are kept fixed during training. Also, caffe setting use_global_stats = True is reproduced during training. Running mean and variance are not calculated during training.

When run on a Nvidia Titan X GPU, train.py occupies about 11.9 GB of memory.

Evaluation

Evaluation of the saved models can be done by running

python evalpyt.py

To get a description of each command-line arguments, run

python evalpyt.py -h

Results

When trained on VOC augmented training set (with 10582 images) using MS COCO pretrained initialization in pytorch, we get a validation performance of 72.40%(evalpyt2.py, on VOC). The corresponding .pth file can be downloaded here. This is in comparision to 75.54% that is acheived by using train_iter_20000.caffemodel released by authors, which can be replicated by running this file . The .pth model converted from .caffemodel using the first section also gives 75.54% mean IOU. A previous version of this file reported mean IOU of 78.48% on the pytorch trained model which is caclulated in a different way (evalpyt.py, Mean IOU is calculated for each image and these values are averaged together. This way of calculating mean IOU is different than the one used by authors).

To replicate this performance, run

train.py --lr 0.00025 --wtDecay 0.0005 --maxIter 20000 --GTpath <train gt images path here> --IMpath <train images path here> --LISTpath data/list/train_aug.txt

Dataset

The model presented in the results section was trained using the augmented VOC train set which was released by this paper. You may download this augmented data directly from here.

Note that this code can be used to train pytorch-deeplab-resnet model for other datasets also.

Acknowledgement

A part of the code has been borrowed from https://github.com/ry/tensorflow-resnet.

Comments
  • Fine tuning on a smaller GPU

    Fine tuning on a smaller GPU

    Hi,

    I have a datase with 2 classes in VOC format. as I realized, you have prepared the fine tuning by some flags. Correct me if I'm wrong.

    Besides, my GPU is GTX1060 with 6G memory. Does the calculated memory consumption belongs to the full 21 classes original VOC? I mean can I train the model on this small dataset?

    opened by MyVanitar 12
  • Wrong Evaluation script

    Wrong Evaluation script

    Nice work porting the model. I found that your evaluation code is wrong. You are evaluating on an image by image basis and summing up IoUs across the val. set. That is not how mean IoU is computed. It is accumulated over pixels. Refer to the FCN code here to see what I mean: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/score.py

    If I change your evaluation to the correct eval. script, your trained model gets a mIoU of only 72.1%. Also, the deeplab resnet-101 model you had ported into torch only gives 75.4% as opposed to the original 76.3% from CAFFE. This might be due to the different preprocessing you do compared to the deeplab guys or might be due to small errors in porting the model.

    It will be great if you can confirm this and put out a log in the README saying you are fixing your eval. script.

    bug 
    opened by swamiviv 12
  • TypeError: 'float' object cannot be interpreted as an integer

    TypeError: 'float' object cannot be interpreted as an integer

    Thanks for the amazing work. I am trying to use your model description for another segmentation problem.

    But when I run

    python train.py

    this is the error log that I get.

    Traceback (most recent call last): File "train.py", line 219, in images, label = get_data_from_chunk_v2(chunk) File "train.py", line 113, in get_data_from_chunk_v2 labels = [resize_label_batch(gt,i) for i in [a,a,b,a]] File "train.py", line 113, in labels = [resize_label_batch(gt,i) for i in [a,a,b,a]] File "train.py", line 65, in resize_label_batch label_resized = np.zeros((size,size,1,label.shape[3])) TypeError: 'float' object cannot be interpreted as an integer

    Additional Info: I am running python 3.5

    opened by mohitsharma-sh 10
  • model.load_state_dict(saved_state_dict) error

    model.load_state_dict(saved_state_dict) error

    Excuse me , when i run 'python train.py',a mistake happened as follow:

    File "train.py", line 222, in model.load_state_dict(saved_state_dict) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 331, in load_state_dict .format(name)) KeyError: 'unexpected key "Scale.conv1.weight" in state_dict'

    I use the coco pretrained model 'MS_DeepLab_resnet_pretrained_COCO_init.pth' to fine tune by voc , hope for response , Thank you !

    opened by lianshushu 9
  • Issue while evaluating trained model

    Issue while evaluating trained model

    RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/THCTensorCopy.cu:31

    I have finetuning this model to train on my custom dataset of images. The groundtruth has only two labels [0 and 255]. However when I test my image using the eval2.py script, I get the following error:

    {'--gpu0': '0', '--help': False, '--snapPrefix': 'VOC12_scenes_', '--testGTpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/BinaryResizedGroundtruthPng/', '--testIMpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/ResizedOriginalImages/', '--visualize': True} VOC12_scenes_ Traceback (most recent call last): File "evalpyt2.py", line 87, in model.load_state_dict(saved_state_dict) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 335, in load_state_dict own_state[name].copy_(param) RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/THCTensorCopy.cu:31

    I have crosschecked and the sizes of my input image and the groundtruth do match. I am not sure what is causing this error.

    Any help would be much appreciated.

    opened by sequae92 7
  • Training produces model generating blank segmentations

    Training produces model generating blank segmentations

    Hi, thanks for the work implementing the model and training script.

    I'm attempting to train on optical flow RGB image data with binary segmentation masks (where 0=background and 1=foreground). However, no matter my choice of hyperparameters or number of iterations (20k/40k/80k), the loss generally steadily decreases but the resulting model predicts all background pixels for input images at test time.

    I've confirmed that the pretrained model segments correctly, so there's something wrong in the training process. The weights are non-zero but argmax always seems to choose class 0. Do you have any idea what might be wrong?

    I'm using a GeForce GTX 1080 and Python 2.7 and am not encountering any memory or other such errors during training.

    opened by aeonstasis 6
  • segmentation results and iteration number setting

    segmentation results and iteration number setting

    Why do you use 20000 iteration size with only batch_size=1? In this way, each training image is only passed for 2 times to get the final model.

    Another question is that I have run evalpyt2.py and your trained VOC12_scenes_16000.pth model to test. But I only got a validation performance of 66.7%. Do you know why it is so low? Thanks!

    opened by lemondan 6
  • Inconsistency in memory consumption of Resnet-101 libraries

    Inconsistency in memory consumption of Resnet-101 libraries

    Hi, Thank you @isht7 for writing this code. I am having problems in the memory used by the code. If I use a batch size of 1, the memory consumed is around 7-8 GB. I have only 1 GPU and hence I cannot increase the batch size further. However, when I used this library - https://github.com/speedinghzl/Pytorch-Deeplab which implements Deeplabv2 Resnet 101, the batch size can be increased to 10. Isn't this unusual? Could you tell me of any changes I need to make to your code so that I can increase batch size? My GPU has 11.1 GB of memory. Thanking you in anticipation. Regards, Omkar.

    opened by omkar13 5
  • Dataset ?

    Dataset ?

    Hi, great code!

    Could you specify exactly which dataset corresponds to train_aug.txt and val.txt and where to get it ? It is the augmented VOC2012, right?

    opened by Matlmr 5
  • BatchNorm usage

    BatchNorm usage

    Hi, the parameters of BatchNorm layer in resnet101 is fixed by here

    But the running_mean and running_var is also need to be fix, so I think we need to set BatchNorm to eval mode, not just fix parameters (weight and bias)

    opened by Eniac-Xie 5
  • UnsamplingBilinear2d throwing error

    UnsamplingBilinear2d throwing error

    Hello I am using anaconda environment to execute your train code. But I am getting below error: /home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:180: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead. warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.") Traceback (most recent call last): File "train.py", line 222, in out = model(images) File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "/home/abhash/Documents/Abhash/MG/deeplab-resnet/pytorch-deeplab-resnet/deeplab_resnet.py", line 198, in forward out.append(self.interp3(self.Scale(x2))) # for 0.75x scale File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py", line 181, in forward return super(UpsamplingBilinear2d, self).forward(input) File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py", line 79, in forward return F.upsample(input, self.size, self.scale_factor, self.mode) File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1375, in upsample return _functions.thnn.UpsamplingBilinear2d.apply(input, _pair(size), scale_factor) File "/home/abhash/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/thnn/upsampling.py", line 277, in forward ctx.output_size[1], TypeError: CudaSpatialUpSamplingBilinear_updateOutput received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, torch.cuda.FloatTensor, float, float), but expected (int state, torch.cuda.FloatTensor input, torch.cuda.FloatTensor output, int outputHeight, int outputWidth)

    Also, I am new to pytorch, so let pardon me if I am missing any obvious thing here. P.S.: anaconda3, python3.6 with GPU CUDA: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61

    opened by sinha-abhash 3
  • performance

    performance

    Why is the result on the val set more than three points different from that published by the author? If I want to use it in my WSSS job,will it have a big impact? Thanks!!

    opened by aoyanl 1
  • read the ground truth of the pascal voc dataset

    read the ground truth of the pascal voc dataset

    hello, the code you use to read the ground truth of dataset is

        gt_temp = cv2.imread(os.path.join(gt_path,piece+'.png'))[:,:,0]
        gt_temp[gt_temp == 255] = 0
    

    why there is "[:,:,0]" for the ground truth? thank you very much! and when I use cv2 to read the ground truth of pascal voc dataset,

    import cv2
    import numpy as np
    img_cv= cv2.imread('/Users/zhangying/Desktop/2007_000129.png')
    img_array_cv = img_cv[:,:,0]
    print('use the method CV\n ',np.unique(img_array_cv))
    

    the output is

    use the method CV
      [  0 128 192]
    

    however the sum of the categories is 21 for voc dataset(128>20,192>20, is it reasonable?). How do you think about this problem? Thanks a lot!

    opened by zhangyingbit 0
  • network stucture issue

    network stucture issue

    I just go through the model in MS_Deeplab. In forward pass, the input image x has been pass through Resnet 4 times with different scales. But according to original paper, multi scale should happen after layer5 in resnet. May I know why implementation this way?

    opened by Kelvin945 0
  • docopt

    docopt

    Hi to all! I have never used docopt package before. To be able to execute the train.py file, which argument should I input to docstr variable? Im getting an error at "args = docopt(docstr, version='v0.1')"

    The error I get is as belows:

      File "/media/sinem/LENOVO/deeplab-resnet-pytorch/train.py", line 41, in <module>
        args = docopt(docstr, version='v0.1')
      File "/usr/local/lib/python2.7/dist-packages/docopt.py", line 558, in docopt
        DocoptExit.usage = printable_usage(doc)
      File "/usr/local/lib/python2.7/dist-packages/docopt.py", line 468, in printable_usage
        raise DocoptLanguageError('"usage:" (case-insensitive) not found.')
    docopt.DocoptLanguageError: "usage:" (case-insensitive) not found. 
    
    

    Do you have a suggestion on how i can solve this?

    I installed the latest version: docopt-0.6.2.tar.gz

    Cheers, sinem.

    opened by siinem 2
  • ASPP or LargeFOV? Should be 76.35%.

    ASPP or LargeFOV? Should be 76.35%.

    I have a question about the performance. "This is in comparision to 75.54% that is acheived by using train_iter_20000.caffemodel released by authors, which can be replicated by running this file . The .pth model converted from .caffemodel using the first section also gives 75.54% mean IOU. " But when I checked the model file and the trained .pth file, I found that the model applied ASPP instead of LargeFOV, which means the model performance should have achieved 76.35% as depicted in the paper. Why the performance is 75.54?

    deeplab

    opened by chenyzh28 0
Owner
Isht Dwivedi
Isht Dwivedi
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

Facebook Research 887 Jan 8, 2023
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

Aaron Chen 2.4k Dec 28, 2022
Reproduces ResNet-V3 with pytorch

ResNeXt.pytorch Reproduces ResNet-V3 (Aggregated Residual Transformations for Deep Neural Networks) with pytorch. Tried on pytorch 1.6 Trains on Cifar

Pau Rodriguez 481 Dec 23, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

null 349 Dec 8, 2022
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet

Reproduce ResNet-v2 using MXNet Requirements Install MXNet on a machine with CUDA GPU, and it's better also installed with cuDNN v5 Please fix the ran

Wei Wu 531 Dec 4, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

null 5 Nov 3, 2022
In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

cdf_att_classification classes = {0: 'cat', 1: 'dog', 2: 'flower'} In this project we use both Resnet and Self-attention layer for cdf-Classification.

null 3 Nov 23, 2022
Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 8, 2022
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Roxbili 5 Nov 19, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 3, 2023
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022