This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

Overview

DeepLab-ResNet-TensorFlow

This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

Updates

29 Jan, 2017:

  • Fixed the implementation of the batch normalisation layer: it now supports both the training and inference steps. If the flag --is-training is provided, the running means and variances will be updated; otherwise, they will be kept intact. The .ckpt files have been updated accordingly - to download please refer to the new link provided below.
  • Image summaries during the training process can now be seen using TensorBoard.
  • Fixed the evaluation procedure: the 'void' label (255) is now correctly ignored. As a result, the performance score on the validation set has increased to 80.1%.

Model Description

The DeepLab-ResNet is built on a fully convolutional variant of ResNet-101 with atrous (dilated) convolutions, atrous spatial pyramid pooling, and multi-scale inputs (not implemented here).

The model is trained on a mini-batch of images and corresponding ground truth masks with the softmax classifier at the top. During training, the masks are downsampled to match the size of the output from the network; during inference, to acquire the output of the same size as the input, bilinear upsampling is applied. The final segmentation mask is computed using argmax over the logits. Optionally, a fully-connected probabilistic graphical model, namely, CRF, can be applied to refine the final predictions. On the test set of PASCAL VOC, the model achieves 79.7% of mean intersection-over-union.

For more details on the underlying model please refer to the following paper:

@article{CP2016Deeplab,
  title={DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs},
  author={Liang-Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille},
  journal={arXiv:1606.00915},
  year={2016}
}

Requirements

TensorFlow needs to be installed before running the scripts. TensorFlow 0.12 is supported; for TensorFlow 0.11 please refer to this branch.

To install the required python packages (except TensorFlow), run

pip install -r requirements.txt

or for a local installation

pip install -user -r requirements.txt

Caffe to TensorFlow conversion

To imitate the structure of the model, we have used .caffemodel files provided by the authors. The conversion has been performed using Caffe to TensorFlow with an additional configuration for atrous convolution and batch normalisation (since the batch normalisation provided by Caffe-tensorflow only supports inference). There is no need to perform the conversion yourself as you can download the already converted models - deeplab_resnet.ckpt (pre-trained) and deeplab_resnet_init.ckpt (the last layers are randomly initialised) - here.

Nevertheless, it is easy to perform the conversion manually, given that the appropriate .caffemodel file has been downloaded, and Caffe to TensorFlow dependencies have been installed. The Caffe model definition is provided in misc/deploy.prototxt. To extract weights from .caffemodel, run the following:

python convert.py /path/to/deploy/prototxt --caffemodel /path/to/caffemodel --data-output-path /where/to/save/numpy/weights

As a result of running the command above, the model weights will be stored in /where/to/save/numpy/weights. To convert them to the native TensorFlow format (.ckpt), simply execute:

python npy2ckpt.py /where/to/save/numpy/weights --save-dir=/where/to/save/ckpt/weights

Dataset and Training

To train the network, one can use the augmented PASCAL VOC 2012 dataset with 10582 images for training and 1449 images for validation.

The training script allows to monitor the progress in the optimisation process using TensorBoard's image summary. Besides that, one can also exploit random scaling of the inputs during training as a means for data augmentation. For example, to train the model from scratch with random scale turned on, simply run:

python train.py --random-scale

To see the documentation on each of the training settings run the following:

python train.py --help

An additional script, fine_tune.py, demonstrates how to train only the last layers of the network.

Evaluation

The single-scale model shows 80.1% mIoU on the Pascal VOC 2012 validation dataset. No post-processing step with CRF is applied.

The following command provides the description of each of the evaluation settings:

python evaluate.py --help

Inference

To perform inference over your own images, use the following command:

python inference.py /path/to/your/image /path/to/ckpt/file

This will run the forward pass and save the resulted mask with this colour map:

Missing features

At the moment, the post-processing step with CRF is not implemented. Besides that, multi-scale inputs are missing, as well. No weight regularisation is applied.

Other implementations

You might also like...
PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

 Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

 U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation
U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx] 3D ResNet Video Classification accelerated by TensorRT
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Comments
  • why should the network be initialized with the input img during the procedure of inference

    why should the network be initialized with the input img during the procedure of inference

    hello, I have read part of the source code of these project, and I find the content in the inference.py module from line 54 to line 63, and I pasts it at here as following:

    Prepare image.

    img_orig = tf.image.decode_jpeg(tf.read_file(args.img_path), channels=3)
    # Convert RGB to BGR.
    img_r, img_g, img_b = tf.split(axis=2, num_or_size_splits=3, value=img_orig)
    img = tf.cast(tf.concat(axis=2, values=[img_b, img_g, img_r]), dtype=tf.float32)
    # Extract mean.
    img -= IMG_MEAN 
    
    # Create network.
    net = DeepLabResNetModel({'data': tf.expand_dims(img, dim=0)}, is_training=False)
    

    However, as far as my concern, to utilize a trained model to do inference, the most common way should be like that: (1) build the network and load the network weight param got from the previous trainning (2)input the image for inference anyway, according to the source code like above, the input image for inferencing was utilized as the parameter for the network initialization. My question is, why did the input image should be utilized as a parameter for the networkinitialization, can I init the network alone without it, so as to executing the inference of this model more efficiently(I means, initializinig the model at beginning only once, and to do the inference of several images one by one next)

    Thank for your help.

    opened by zjbit 0
  • "TypeError: Expected int32, got list containing Tensors of type '_Message' instead"

    I tried running train.py and I've got the following error: Traceback (most recent call last): File "train.py", line 187, in main() File "train.py", line 105, in main coord) File "C:\Users\Matheus\Desktop\UFRGS\Mestrado\tutorial_python\APLICACAO_A_AGRICULTURA\tensorflow-deeplab-resnet-crf-master\deeplab_resnet\image_reader.py", line 95, in init self.image, self.label = read_images_from_disk(self.queue, self.input_size, random_scale) File "C:\Users\Matheus\Desktop\UFRGS\Mestrado\tutorial_python\APLICACAO_A_AGRICULTURA\tensorflow-deeplab-resnet-crf-master\deeplab_resnet\image_reader.py", line 64, in read_images_from_disk img = tf.cast(tf.concat(2, [img_b, img_g, img_r]), dtype=tf.int32) File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1172, in concat dtype=dtypes.int32).get_shape().assert_is_compatible_with( File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 946, in convert_to_tensor as_ref=False) File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1036, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\constant_op.py", line 235, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\constant_op.py", line 214, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 433, in make_tensor_proto _AssertCompatible(values, dtype) File "C:\Users\Matheus\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 344, in _AssertCompatible (dtype.name, repr(mismatch), type(mismatch).name)) TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

    Can you help me? =(

    opened by matheuscass 0
Owner
null
A transformer which can randomly augment VOC format dataset (both image and bbox) online.

VocAug It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it i

Coder.AN 1 Mar 5, 2022
AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

AutoML for Image Semantic Segmentation Currently this repo contains the only working open-source implementation of Auto-Deeplab which, by the way out-

AI Necromancer 299 Dec 17, 2022
Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Nguyễn Trường Lâu 6 Aug 22, 2022
Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

TXT 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Txt2Xml too

Nguyễn Trường Lâu 4 Nov 24, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

null 349 Dec 8, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022