Image Segmentation and Object Detection in Pytorch

Daniil Pakhomov

Last update: Dec 10, 2022

Related tags

Deep Learning pytorch-segmentation-detection

Overview

Image Segmentation and Object Detection in Pytorch

Pytorch-Segmentation-Detection is a library for image segmentation and object detection with reported results achieved on common image segmentation/object detection datasets, pretrained models and scripts to reproduce them.

Segmentation

PASCAL VOC 2012

Implemented models were tested on Restricted PASCAL VOC 2012 Validation dataset (RV-VOC12) or Full PASCAL VOC 2012 Validation dataset (VOC-2012) and trained on the PASCAL VOC 2012 Training data and additional Berkeley segmentation data for PASCAL VOC 12.

You can find all the scripts that were used for training and evaluation here.

This code has been used to train networks with this performance:

Model	Test data	Mean IOU	Mean pix. accuracy	Pixel accuracy	Inference time (512x512 px. image)	Model Download Link	Related paper
Resnet-18-8s	RV-VOC12	59.0	in prog.	in prog.	28 ms.	Dropbox	DeepLab
Resnet-34-8s	RV-VOC12	68.0	in prog.	in prog.	50 ms.	Dropbox	DeepLab
Resnet-50-16s	VOC12	66.5	in prog.	in prog.	in prog.	in prog.	DeepLab
Resnet-50-8s	VOC12	67.0	in prog.	in prog.	in prog.	in prog.	DeepLab
Resnet-50-8s-deep-sup	VOC12	67.1	in prog.	in prog.	in prog.	in prog.	DeepLab
Resnet-101-16s	VOC12	68.6	in prog.	in prog.	in prog.	in prog.	DeepLab
PSP-Resnet-18-8s	VOC12	68.3	n/a	n/a	n/a	in prog.	PSPnet
PSP-Resnet-50-8s	VOC12	73.6	n/a	n/a	n/a	in prog.	PSPnet

Some qualitative results:

Endovis 2017

Implemented models were trained on Endovis 2017 segmentation dataset and the sequence number 3 was used for validation and was not included in training dataset.

The code to acquire the training and validating the model is also provided in the library.

Additional Qualitative results can be found on this youtube playlist.

Binary Segmentation

Model	Test data	Mean IOU	Mean pix. accuracy	Pixel accuracy	Inference time (512x512 px. image)	Model Download Link
Resnet-9-8s	Seq # 3 *	96.1	in prog.	in prog.	13.3 ms.	Dropbox
Resnet-18-8s	Seq # 3	96.0	in prog.	in prog.	28 ms.	Dropbox
Resnet-34-8s	Seq # 3	in prog.	in prog.	in prog.	50 ms.	in prog.

Resnet-9-8s network was tested on the 0.5 reduced resoulution (512 x 640).

Qualitative results (on validation sequence):

Multi-class Segmentation

Model	Test data	Mean IOU	Mean pix. accuracy	Pixel accuracy	Inference time (512x512 px. image)	Model Download Link
Resnet-18-8s	Seq # 3	81.0	in prog.	in prog.	28 ms.	Dropbox
Resnet-34-8s	Seq # 3	in prog.	in prog.	in prog.	50 ms.	in prog

Qualitative results (on validation sequence):

Cityscapes

The dataset contains video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames. The annotations contain 19 classes which represent cars, road, traffic signs and so on.

Model	Test data	Mean IOU	Mean pix. accuracy	Pixel accuracy	Inference time (512x512 px. image)	Model Download Link
Resnet-18-32s	Validation set	61.0	in prog.	in prog.	in prog.	in prog.
Resnet-18-8s	Validation set	60.0	in prog.	in prog.	28 ms.	Dropbox
Resnet-34-8s	Validation set	69.1	in prog.	in prog.	50 ms.	Dropbox
Resnet-50-16s-PSP	Validation set	71.2	in prog.	in prog.	in prog.	in prog.

Qualitative results (on validation sequence):

Whole sequence can be viewed here.

Installation

This code requires:

Pytorch.
Some libraries which can be acquired by installing Anaconda package.

Or you can install scikit-image, matplotlib, numpy using pip.
Clone the library:

git clone --recursive https://github.com/warmspringwinds/pytorch-segmentation-detection

And use this code snippet before you start to use the library:

import sys
# update with your path
# All the jupyter notebooks in the repository already have this
sys.path.append("/your/path/pytorch-segmentation-detection/")
sys.path.insert(0, '/your/path/pytorch-segmentation-detection/vision/')

Here we use our pytorch/vision fork, which might be merged and futher merged in a future. We have added it as a submodule to our repository.

Download segmentation or detection models that you want to use manually (links can be found below).

About

If you used the code for your research, please, cite the paper:

@article{pakhomov2017deep,
  title={Deep Residual Learning for Instrument Segmentation in Robotic Surgery},
  author={Pakhomov, Daniil and Premachandran, Vittal and Allan, Max and Azizian, Mahdi and Navab, Nassir},
  journal={arXiv preprint arXiv:1703.08580},
  year={2017}
}

During implementation, some preliminary experiments and notes were reported:

Comments

Help cleans resnet train.ipynb -- use relative path imports, rid ACT module

Hey there, thanks for this great repo. Very easy to use and clean implementation.

This tiny PR cleans a couple things that might be helpful for others. It brings in the nice relative path imports in other notebooks that weren't in resnet_34_8s_train.ipynb. Also there was an import from adaptive_computation_time, but this module isn't needed.

If I get around to it later I might also help add some documentation on setting up VOC for use with this repo.

Cheers!

opened by peteflorence 7
No such file or directory: 'resnet_34_8s_66.pth'

Hi @warmspringwinds I am getting - IOError: [Errno 2] No such file or directory: 'resnet_34_8s_66.pth' when running resnet_34_8s_demo.ipynb

Not able to figure out when and how the model will generate. Any clue on the next step .

Thanks Akash

opened by akashgoyal 6
Change call to .apply()

Great script!

Btw, .apply() uses .children(), which may lead to skip convolutional layers in submodules. I replaced it for modules() for if you want to change it.
bug

opened by prlz77 3
adaptive_computation_time

Hi Daniil, I'm trying to work on image segmentation of microscopy images using pytorch. I've been trying to work with your examples. But i'm having error on resnet_34_8s_train.

ImportError: No module named adaptive_computation_time

I wonder if it's something in the anaconda?
bug

opened by hftsai 2
about image size of training set

Hello Daniil, In the training process of your ResNet-8s, I notice that you crop all training images to 224x224 (RandomCropJoint(crop_size=(224, 224))). But you didn't adopt this approach when you train your FCN-32s model. Is it because the ResNet pretrained model is used as initial weights so we need to comply with its input image size (224x224) too? Do you think other input size can be used for training, without causing accuracy decline? Please advice. Thanks.

opened by MahlerMozart 2
about Resnet18_8s

Hello, I am very impressed by your great work! However, I am a little confused when I look at your Resnet18_8s network. I assume Resnet18_8s follows your approach in your paper "Deep Residual Learning for Instrument Segmentation in Robotic Surgery", which employ dilated convolutions to keep resolution. But in resnet_dilated.py, I could not find any dilated convolutions in Class Resnet18_8s. Could you please give more detailed explanation on the structure of Resnet18_8s? Many thanks.

opened by MahlerMozart 2
Error when trying to run resnet_34_8s_test

It says

TypeError Traceback (most recent call last) in () 34 img = Variable(img.cuda()) 35 ---> 36 fcn = resnet_dilated.Resnet34_8s(num_classes=19) 37 fcn.load_state_dict(torch.load('/home/sawyer/workspace/segmentation/resnet_34_8s_cityscapes_best.pth')) 38 fcn.cuda()

/home/sawyer/workspace/segmentation/pytorch-segmentation-detection/pytorch_segmentation_detection/models/resnet_dilated.pyc in init(self, num_classes) 293 pretrained=True, 294 output_stride=8, --> 295 remove_avg_pool_layer=True) 296 297 # Randomly initialize the 1x1 Conv scoring layer

/home/sawyer/workspace/segmentation/pytorch-segmentation-detection/vision/torchvision/models/resnet.pyc in resnet34(pretrained, **kwargs) 172 pretrained (bool): If True, returns a model pre-trained on ImageNet 173 """ --> 174 model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs) 175 if pretrained: 176 model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))

TypeError: init() got an unexpected keyword argument 'fully_conv'

In both python2.7-3.5 torch version 1.0.1

opened by ikoc 1
Convert map outputs to lists for Python3 support

Hey Daniil, thanks for your work on this repo.

I noticed that some places in the code use the output of the map function directly, which is a problem for Python3 since the function returns a map object instead of a list. I've just gone ahead and converted instances of map object to list in a bunch of places -- I noticed you did this for some files already, but it's currently breaking the cityscapes dataset, which I'm currently trying to use :-)

The Python2 docs guarantee the result of map is always a list so I assume this shouldn't break anything.

opened by erasaur 1
fully_conv in vgg16
Very good repository.

How did you make this work ? it seems that vgg16 does not have the fully_conv keyword in torchvision

vgg16 = models.vgg16(pretrained=True, fully_conv=True)
opened by IssamLaradji 1
Optimizer for unet model on Pascal Voc segmentation

Hello, Can I know the optimizer and its specifications to use on unet model on Pascal Voc segmentation using FOCAL loss ? Should I have to use any learning rate schedulers? Also, is it better to take mean focal loss or sum focal loss ? I am training the model from scratch.

opened by saiphanish7 0
Error(s) in loading state_dict for Resnet18_8s

@warmspringwinds I am getting the following error for resnet_18_8s_59.pth

RuntimeError: Error(s) in loading state_dict for Resnet18_8s: size mismatch for resnet18_8s.fc.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([21]). size mismatch for resnet18_8s.fc.weight: copying a param with shape torch.Size([2, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([21, 512, 1, 1]).

If I change num_classes=21 to num_classes=2, it generates the output without any segmentation (purple screen)

opened by sagar1garg 7

RuntimeError: Error(s) in loading_state_dict for VGG

By using your fork of torchvision and default installation of pytorch for Linux-Python3.6-CUDA10:

init_weights argument in the class VGG was missing.
After fixing (1), the following error was generated:

In [1]: from torchvision import models                                                                               
In [2]: model = models.vgg16(pretrained=True, fully_conv=True)                                                       

RuntimeError                              Traceback (most recent call last)
<ipython-input-2-802ee77a237c> in <module>
----> 1 model = models.vgg16(pretrained=True, fully_conv=True)

~/repositories/github/pytorch-segmentation-detection/vision/torchvision/models/vgg.py in vgg16(pretrained, **kwargs)
    164     model = VGG(make_layers(cfg['D']), **kwargs)
    165     if pretrained:
--> 166         model.load_state_dict(model_zoo.load_url(model_urls['vgg16']))
    167     return model
    168 

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    767         if len(error_msgs) > 0:
    768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
    770 
    771     def _named_members(self, get_members_fn, prefix='', recurse=True):

RuntimeError: Error(s) in loading state_dict for VGG:
	size mismatch for classifier.0.weight: copying a param with shape torch.Size([4096, 25088]) from checkpoint, the shape in current model is torch.Size([4096, 512, 7, 7]).
	size mismatch for classifier.3.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096, 1, 1]).
	size mismatch for classifier.6.weight: copying a param with shape torch.Size([1000, 4096]) from checkpoint, the shape in current model is torch.Size([1000, 4096, 1, 1]).

opened by ar13pit 0

RuntimeError: value cannot be converted to type float without overflow

Hi, I try to train model using python 3, but I got below issue:

/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/nn/_reduction.py:46: UserWarning: size│
_average and reduce args will be deprecated, please use reduction='sum' instead.                         │
  warnings.warn(warning.format(ret))                                                                     │
/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/nn/functional.py:2622: UserWarning: nn│
.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.                      │
  warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.") │
0.4247354666311015                                                                                       │
0.5617590819998219                                                                                       │
0.5815541637890524                                                                                       │
0.6344758887881029                                                                                       │
Traceback (most recent call last):                                                                       │
  File "pytorch_segmentation_detection/recipes/pascal_voc/segmentation/psp_resnet_50_8s_train.py", line 3│
76, in <module>                                                                                          │
    optimizer.step()                                                                                     │
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/optim/adam.py", line 107, in s│
tep                                                                                                      │
    p.data.addcdiv_(-step_size, exp_avg, denom)                                                          │
RuntimeError: value cannot be converted to type float without overflow: (3.52033e-08,-1.14383e-08)

Can someone give me suggestion?

opened by dgks0n 1

Difference Between Semantic Segmentation and Image Classification

I'm new to implementing CNNs and I'm trying to understand how a model knows whether to perform semantic classification (pixelwise) or image classification (one class per image). As far as I can see, the only difference is in the models/resnet_dilated.py file in the lines resnet34_8s.fc = nn.Conv2d(resnet34_8s.inplanes, num_classes, 1)

whereas most other codes have it as resnet34_8s.fc = nn.Conv2d(resnet34_8s.fc.in_features, num_classes)

Is this the difference between returning a logits of shape [batch x num_classes x H x W] and [batch x num_classes]?

opened by mugdhapolimera 0

Owner

Daniil Pakhomov

Phd student at JHU. Research interests: Image Classification, Image Segmentation, Face Detection and Face Recognition mostly based on Deep Learning.

GitHub

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022

Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 9, 2022

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

443 Dec 6, 2022

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

9k Jan 4, 2023

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

305 Dec 16, 2022

This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

7 May 29, 2022

This project uses Template Matching technique for object detecting by detection of template image over base image

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

4 Nov 16, 2021

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

1 Dec 29, 2021

Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation. Our paper is accepted by IEEE Transactions on Cybernetics

290 Dec 25, 2022

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

22.5k Jan 4, 2023

Object Detection and Multi-Object Tracking

1.6k Jan 4, 2023

Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

23.3k Jan 8, 2023

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

Object detection and instance segmentation toolkit based on PaddlePaddle.

9.3k Jan 2, 2023

Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

55 Oct 30, 2022

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

9 Nov 24, 2022

Image Segmentation and Object Detection in Pytorch

Related tags

Overview

Image Segmentation and Object Detection in Pytorch

Segmentation

PASCAL VOC 2012

Endovis 2017

Binary Segmentation

Multi-class Segmentation

Cityscapes

Installation

About

Comments

It says

Owner

Daniil Pakhomov

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

Auto-Lama combines object detection and image inpainting to automate object removals

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

This project uses Template Matching technique for object detecting by detection of template image over base image.

This project uses Template Matching technique for object detecting by detection of template image over base image

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Object Detection and Multi-Object Tracking

Detectron2 is FAIR's next-generation platform for object detection and segmentation.

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Object detection and instance segmentation toolkit based on PaddlePaddle.

Res2Net for Instance segmentation and Object detection using MaskRCNN

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.