Image Segmentation and Object Detection in Pytorch

Overview

Image Segmentation and Object Detection in Pytorch

Pytorch-Segmentation-Detection is a library for image segmentation and object detection with reported results achieved on common image segmentation/object detection datasets, pretrained models and scripts to reproduce them.

Segmentation

PASCAL VOC 2012

Implemented models were tested on Restricted PASCAL VOC 2012 Validation dataset (RV-VOC12) or Full PASCAL VOC 2012 Validation dataset (VOC-2012) and trained on the PASCAL VOC 2012 Training data and additional Berkeley segmentation data for PASCAL VOC 12.

You can find all the scripts that were used for training and evaluation here.

This code has been used to train networks with this performance:

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link Related paper
Resnet-18-8s RV-VOC12 59.0 in prog. in prog. 28 ms. Dropbox DeepLab
Resnet-34-8s RV-VOC12 68.0 in prog. in prog. 50 ms. Dropbox DeepLab
Resnet-50-16s VOC12 66.5 in prog. in prog. in prog. in prog. DeepLab
Resnet-50-8s VOC12 67.0 in prog. in prog. in prog. in prog. DeepLab
Resnet-50-8s-deep-sup VOC12 67.1 in prog. in prog. in prog. in prog. DeepLab
Resnet-101-16s VOC12 68.6 in prog. in prog. in prog. in prog. DeepLab
PSP-Resnet-18-8s VOC12 68.3 n/a n/a n/a in prog. PSPnet
PSP-Resnet-50-8s VOC12 73.6 n/a n/a n/a in prog. PSPnet

Some qualitative results:

Alt text

Endovis 2017

Implemented models were trained on Endovis 2017 segmentation dataset and the sequence number 3 was used for validation and was not included in training dataset.

The code to acquire the training and validating the model is also provided in the library.

Additional Qualitative results can be found on this youtube playlist.

Binary Segmentation

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link
Resnet-9-8s Seq # 3 * 96.1 in prog. in prog. 13.3 ms. Dropbox
Resnet-18-8s Seq # 3 96.0 in prog. in prog. 28 ms. Dropbox
Resnet-34-8s Seq # 3 in prog. in prog. in prog. 50 ms. in prog.

Resnet-9-8s network was tested on the 0.5 reduced resoulution (512 x 640).

Qualitative results (on validation sequence):

Alt text

Multi-class Segmentation

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link
Resnet-18-8s Seq # 3 81.0 in prog. in prog. 28 ms. Dropbox
Resnet-34-8s Seq # 3 in prog. in prog. in prog. 50 ms. in prog

Qualitative results (on validation sequence):

Alt text

Cityscapes

The dataset contains video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames. The annotations contain 19 classes which represent cars, road, traffic signs and so on.

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link
Resnet-18-32s Validation set 61.0 in prog. in prog. in prog. in prog.
Resnet-18-8s Validation set 60.0 in prog. in prog. 28 ms. Dropbox
Resnet-34-8s Validation set 69.1 in prog. in prog. 50 ms. Dropbox
Resnet-50-16s-PSP Validation set 71.2 in prog. in prog. in prog. in prog.

Qualitative results (on validation sequence):

Whole sequence can be viewed here.

Alt text

Installation

This code requires:

  1. Pytorch.

  2. Some libraries which can be acquired by installing Anaconda package.

    Or you can install scikit-image, matplotlib, numpy using pip.

  3. Clone the library:

git clone --recursive https://github.com/warmspringwinds/pytorch-segmentation-detection

And use this code snippet before you start to use the library:

import sys
# update with your path
# All the jupyter notebooks in the repository already have this
sys.path.append("/your/path/pytorch-segmentation-detection/")
sys.path.insert(0, '/your/path/pytorch-segmentation-detection/vision/')

Here we use our pytorch/vision fork, which might be merged and futher merged in a future. We have added it as a submodule to our repository.

  1. Download segmentation or detection models that you want to use manually (links can be found below).

About

If you used the code for your research, please, cite the paper:

@article{pakhomov2017deep,
  title={Deep Residual Learning for Instrument Segmentation in Robotic Surgery},
  author={Pakhomov, Daniil and Premachandran, Vittal and Allan, Max and Azizian, Mahdi and Navab, Nassir},
  journal={arXiv preprint arXiv:1703.08580},
  year={2017}
}

During implementation, some preliminary experiments and notes were reported:

Comments
  • Help cleans resnet train.ipynb -- use relative path imports, rid ACT module

    Help cleans resnet train.ipynb -- use relative path imports, rid ACT module

    Hey there, thanks for this great repo. Very easy to use and clean implementation.

    This tiny PR cleans a couple things that might be helpful for others. It brings in the nice relative path imports in other notebooks that weren't in resnet_34_8s_train.ipynb. Also there was an import from adaptive_computation_time, but this module isn't needed.

    If I get around to it later I might also help add some documentation on setting up VOC for use with this repo.

    Cheers!

    opened by peteflorence 7
  • No such file or directory: 'resnet_34_8s_66.pth'

    No such file or directory: 'resnet_34_8s_66.pth'

    Hi @warmspringwinds I am getting - IOError: [Errno 2] No such file or directory: 'resnet_34_8s_66.pth' when running resnet_34_8s_demo.ipynb

    Not able to figure out when and how the model will generate. Any clue on the next step .

    Thanks Akash

    opened by akashgoyal 6
  • Change call to .apply()

    Change call to .apply()

    Great script!

    Btw, .apply() uses .children(), which may lead to skip convolutional layers in submodules. I replaced it for modules() for if you want to change it.

    bug 
    opened by prlz77 3
  • adaptive_computation_time

    adaptive_computation_time

    Hi Daniil, I'm trying to work on image segmentation of microscopy images using pytorch. I've been trying to work with your examples. But i'm having error on resnet_34_8s_train.

    ImportError: No module named adaptive_computation_time

    I wonder if it's something in the anaconda?

    bug 
    opened by hftsai 2
  • about image size of training set

    about image size of training set

    Hello Daniil, In the training process of your ResNet-8s, I notice that you crop all training images to 224x224 (RandomCropJoint(crop_size=(224, 224))). But you didn't adopt this approach when you train your FCN-32s model. Is it because the ResNet pretrained model is used as initial weights so we need to comply with its input image size (224x224) too? Do you think other input size can be used for training, without causing accuracy decline? Please advice. Thanks.

    opened by MahlerMozart 2
  • about Resnet18_8s

    about Resnet18_8s

    Hello, I am very impressed by your great work! However, I am a little confused when I look at your Resnet18_8s network. I assume Resnet18_8s follows your approach in your paper "Deep Residual Learning for Instrument Segmentation in Robotic Surgery", which employ dilated convolutions to keep resolution. But in resnet_dilated.py, I could not find any dilated convolutions in Class Resnet18_8s. Could you please give more detailed explanation on the structure of Resnet18_8s? Many thanks.

    opened by MahlerMozart 2
  • Error when trying to run resnet_34_8s_test

    Error when trying to run resnet_34_8s_test

    It says

    TypeError Traceback (most recent call last) in () 34 img = Variable(img.cuda()) 35 ---> 36 fcn = resnet_dilated.Resnet34_8s(num_classes=19) 37 fcn.load_state_dict(torch.load('/home/sawyer/workspace/segmentation/resnet_34_8s_cityscapes_best.pth')) 38 fcn.cuda()

    /home/sawyer/workspace/segmentation/pytorch-segmentation-detection/pytorch_segmentation_detection/models/resnet_dilated.pyc in init(self, num_classes) 293 pretrained=True, 294 output_stride=8, --> 295 remove_avg_pool_layer=True) 296 297 # Randomly initialize the 1x1 Conv scoring layer

    /home/sawyer/workspace/segmentation/pytorch-segmentation-detection/vision/torchvision/models/resnet.pyc in resnet34(pretrained, **kwargs) 172 pretrained (bool): If True, returns a model pre-trained on ImageNet 173 """ --> 174 model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs) 175 if pretrained: 176 model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))

    TypeError: init() got an unexpected keyword argument 'fully_conv'

    In both python2.7-3.5 torch version 1.0.1

    opened by ikoc 1
  • Convert map outputs to lists for Python3 support

    Convert map outputs to lists for Python3 support

    Hey Daniil, thanks for your work on this repo.

    I noticed that some places in the code use the output of the map function directly, which is a problem for Python3 since the function returns a map object instead of a list. I've just gone ahead and converted instances of map object to list in a bunch of places -- I noticed you did this for some files already, but it's currently breaking the cityscapes dataset, which I'm currently trying to use :-)

    The Python2 docs guarantee the result of map is always a list so I assume this shouldn't break anything.

    opened by erasaur 1
  • fully_conv in vgg16

    fully_conv in vgg16

    Very good repository.

    How did you make this work ? it seems that vgg16 does not have the fully_conv keyword in torchvision

    vgg16 = models.vgg16(pretrained=True,
                                fully_conv=True)
    
    opened by IssamLaradji 1
  • Optimizer for unet model on Pascal Voc segmentation

    Optimizer for unet model on Pascal Voc segmentation

    Hello, Can I know the optimizer and its specifications to use on unet model on Pascal Voc segmentation using FOCAL loss ? Should I have to use any learning rate schedulers? Also, is it better to take mean focal loss or sum focal loss ? I am training the model from scratch.

    opened by saiphanish7 0
  • Error(s) in loading state_dict for Resnet18_8s

    Error(s) in loading state_dict for Resnet18_8s

    @warmspringwinds I am getting the following error for resnet_18_8s_59.pth

    RuntimeError: Error(s) in loading state_dict for Resnet18_8s: size mismatch for resnet18_8s.fc.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([21]). size mismatch for resnet18_8s.fc.weight: copying a param with shape torch.Size([2, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([21, 512, 1, 1]).

    If I change num_classes=21 to num_classes=2, it generates the output without any segmentation (purple screen)

    opened by sagar1garg 7
  • RuntimeError: Error(s) in loading_state_dict for VGG

    RuntimeError: Error(s) in loading_state_dict for VGG

    By using your fork of torchvision and default installation of pytorch for Linux-Python3.6-CUDA10:

    1. init_weights argument in the class VGG was missing.

    2. After fixing (1), the following error was generated:

    In [1]: from torchvision import models                                                                               
    In [2]: model = models.vgg16(pretrained=True, fully_conv=True)                                                       
    
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-2-802ee77a237c> in <module>
    ----> 1 model = models.vgg16(pretrained=True, fully_conv=True)
    
    ~/repositories/github/pytorch-segmentation-detection/vision/torchvision/models/vgg.py in vgg16(pretrained, **kwargs)
        164     model = VGG(make_layers(cfg['D']), **kwargs)
        165     if pretrained:
    --> 166         model.load_state_dict(model_zoo.load_url(model_urls['vgg16']))
        167     return model
        168 
    
    ~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
        767         if len(error_msgs) > 0:
        768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    --> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
        770 
        771     def _named_members(self, get_members_fn, prefix='', recurse=True):
    
    RuntimeError: Error(s) in loading state_dict for VGG:
    	size mismatch for classifier.0.weight: copying a param with shape torch.Size([4096, 25088]) from checkpoint, the shape in current model is torch.Size([4096, 512, 7, 7]).
    	size mismatch for classifier.3.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096, 1, 1]).
    	size mismatch for classifier.6.weight: copying a param with shape torch.Size([1000, 4096]) from checkpoint, the shape in current model is torch.Size([1000, 4096, 1, 1]).
    
    opened by ar13pit 0
  • RuntimeError: value cannot be converted to type float without overflow

    RuntimeError: value cannot be converted to type float without overflow

    Hi, I try to train model using python 3, but I got below issue:

    /home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/nn/_reduction.py:46: UserWarning: size│
    _average and reduce args will be deprecated, please use reduction='sum' instead.                         │
      warnings.warn(warning.format(ret))                                                                     │
    /home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/nn/functional.py:2622: UserWarning: nn│
    .functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.                      │
      warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.") │
    0.4247354666311015                                                                                       │
    0.5617590819998219                                                                                       │
    0.5815541637890524                                                                                       │
    0.6344758887881029                                                                                       │
    Traceback (most recent call last):                                                                       │
      File "pytorch_segmentation_detection/recipes/pascal_voc/segmentation/psp_resnet_50_8s_train.py", line 3│
    76, in <module>                                                                                          │
        optimizer.step()                                                                                     │
      File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/optim/adam.py", line 107, in s│
    tep                                                                                                      │
        p.data.addcdiv_(-step_size, exp_avg, denom)                                                          │
    RuntimeError: value cannot be converted to type float without overflow: (3.52033e-08,-1.14383e-08)
    

    Can someone give me suggestion?

    opened by dgks0n 1
  • Difference Between Semantic Segmentation and Image Classification

    Difference Between Semantic Segmentation and Image Classification

    I'm new to implementing CNNs and I'm trying to understand how a model knows whether to perform semantic classification (pixelwise) or image classification (one class per image). As far as I can see, the only difference is in the models/resnet_dilated.py file in the lines resnet34_8s.fc = nn.Conv2d(resnet34_8s.inplanes, num_classes, 1)

    whereas most other codes have it as resnet34_8s.fc = nn.Conv2d(resnet34_8s.fc.in_features, num_classes)

    Is this the difference between returning a logits of shape [batch x num_classes x H x W] and [batch x num_classes]?

    opened by mugdhapolimera 0
Owner
Daniil Pakhomov
Phd student at JHU. Research interests: Image Classification, Image Segmentation, Face Detection and Face Recognition mostly based on Deep Learning.
Daniil Pakhomov
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

null 44 Dec 9, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

Facebook Research 9k Jan 4, 2023
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 4 Nov 16, 2021
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation. Our paper is accepted by IEEE Transactions on Cybernetics

null 290 Dec 25, 2022
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 4, 2023
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 4, 2023
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 8, 2023
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
BMW TechOffice MUNICH 148 Dec 21, 2022
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

null 9.3k Jan 2, 2023
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022