Dilated Convolution for Semantic Image Segmentation

Overview

Multi-Scale Context Aggregation by Dilated Convolutions

Introduction

Properties of dilated convolution are discussed in our ICLR 2016 conference paper. This repository contains the network definitions and the trained models. You can use this code together with vanilla Caffe to segment images using the pre-trained models. If you want to train the models yourself, please check out the document for training.

If you are looking for dilation models with state-of-the-art performance and Python implementation, please check out Dilated Residual Networks.

Citing

If you find the code or the models useful, please cite this paper:

@inproceedings{YuKoltun2016,
	author    = {Fisher Yu and Vladlen Koltun},
	title     = {Multi-Scale Context Aggregation by Dilated Convolutions},
	booktitle = {ICLR},
	year      = {2016},
}

License

The code and models are released under the MIT License (refer to the LICENSE file for details).

Installation

Caffe

Install Caffe and its Python interface. Make sure that the Caffe version is newer than commit 08c5df.

Python

The companion Python script is used to demonstrate the network definition and trained weights.

The required Python packages are numba numpy opencv. Python release from Anaconda is recommended.

In the case of using Anaconda

conda install numba numpy opencv

Running Demo

predict.py is the main script to test the pre-trained models on images. The basic usage is

python predict.py <dataset name> <image path>

Given the dataset name, the script will find the pre-trained model and network definition. We currently support models trained from four datasets: pascal_voc, camvid, kitti, cityscapes. The steps of using the code is listed below:

  • Clone the code from Github

    git clone [email protected]:fyu/dilation.git
    cd dilation
    
  • Download pre-trained network

    sh pretrained/download_pascal_voc.sh
    
  • Run pascal voc model on GPU 0

    python predict.py pascal_voc images/dog.jpg --gpu 0
    

Training

You are more than welcome to train our model on a new dataset. To do that, please refer to the document for training.

Implementation of Dilated Convolution

Besides Caffe support, dilated convolution is also implemented in other deep learning packages. For example,

Comments
  • ReadProtoFromBinaryFile problem

    ReadProtoFromBinaryFile problem

    hello fyu, I am using code for some tests. I download the model following your instructions.But where I run the python predict.py images/dog.jpg --gpu 0 I always meet this problem.

    upgrade_proto.cpp:86] Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: ./pretrained/dilated_convolution_context_coco.caffemodel

    So, is there something wrong with you caffemodel? Or there exist some problem else?

    Thank you~

    opened by likesiwell 14
  • Check failed: outer_num_ * inner_num_ == bottom[1]->count()

    Check failed: outer_num_ * inner_num_ == bottom[1]->count()

    I want to train the context module, but get the following error:

    F0911 01:01:41.267956 15432 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (3276800 vs. 435600) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N_H_W, with integer values in {0, 1, ..., C-1}.

    I followed the documentation "training.md". First, i had trained the front-end module and than generated the .bin files from test.py (and the feats.txt). I use the following to start the training:

    python train.py context \
    --train_image /home/timo/dilation/feat/train/feats.txt \
    --train_label /home/timo/Cityscapes/gtFine/train/train_city_gt.txt \
    --test_image /home/timo/dilation/feat/val/feats.txt \
    --test_label /home/timo/Cityscapes/gtFine/val/val_city_gt.txt \
    --train_batch 100 \
    --test_batch 10 \
    --caffe /home/timo/dilation/caffe-dilation/build_master_release/tools/caffe \
    --classes 19 \
    --layers 10 \
    --label_shape 66 66
    --lr 0.0001
    --momentum 0.99
    

    I am grateful for every tip.

    opened by Timo-hab 6
  • there is no _caffe.so after make pycaffe

    there is no _caffe.so after make pycaffe

    hi,

    When I

    make all make test make pycaffe

    there is no _caffe.so in caffe-dilation/python/caffe

    and I try the predict.py there is an error ImportError: libcaffe.so.1.0.0-rc3: cannot open shared object file: No such file or directory

    I have already added the build_master/python into my PYTHONPATH

    Does anyone know how to solve this problem?

    Cheers, Mao

    opened by maolin23 5
  • Get label images for evaluation

    Get label images for evaluation

    Hi Fisher

    Thanks for that great repository! Could you still tell me, how to convert the color images to label images, where each pixel has an ID that represents the ground truth label? Is there already a script? Thanks in advance!

    Best Timo

    opened by Timo-hab 4
  • "cudaSuccess (2 vs. 0) out of memory" on GTX Titan

    Hi Fisher, since great results have been reported by using Dilated CNN, I'm thinking to use it as the segmentation engine for my research. After reading your paper, today I started to play around with your code. Strangely, I kept getting the out of memory errors when tried the predict script. After checking the closed issues, I changed the Caffe back to the commit 08c5df but got the same errors.

    I tried camvid, kitti and cityscapes. On camvid it worked, but not on the other two. Since GTX has 12GB memory, "out of memory" seems very weird to me. Is there any hint from your side?

    Does anybody else get the similar errors?

    cheers Rui

    opened by rui2016 3
  • Training files

    Training files

    Thanks for sharing your pre-trained models. Could you also share your training prototxt files and the solver configuration, so that we can easily try to reproduce and/or use your architecture on other datasets ?

    opened by nshaud 3
  • Upconvolution layer at the end of Dilated10 ?

    Upconvolution layer at the end of Dilated10 ?

    Hi there,

    First of all, great work ! :)

    I noticed there is an upconvolution layer at the end of Dilated10, (see here).

    Correct me if I'm wrong but I don't remember seeing this mentioned in the paper. Is this the model which led to the results presented in the paper ? Or perhaps it is a new one ? If so, could you kindly advise on whether you kept the same training procedure ? Thanks ! Cheers,

    Pauline

    opened by paulineluc 2
  • an error occurs when make pycaffe

    an error occurs when make pycaffe

    the python code should import caffe but when I make pycaffe, an error occurs:

    rsync -a --include '/' --include '.py' --exclude '*'
    python/caffe/ build_master_release/python/caffe

    What' s the problem? I cloned the caffe from your link fyu/caffe-dilation

    opened by Engineering-Course 2
  • How can I train on my own dataset?

    How can I train on my own dataset?

    I want to train the model with my own dataset. I'm confused about the input and output forms of the image data. Could you help me? Could you please share the training prototxt files and the solver configuration? Thanks.

    opened by Engineering-Course 2
  • how to set the dilation?

    how to set the dilation?

    Hi, I want to use the ResNet50 with dilation, and I don't know which layer's dilation parameter should be added. Is there any suggestion for me?

    Thanks.

    opened by linquanxu 1
  • apply_dilaton_conv_to_image_classification_such_as_imagenet

    apply_dilaton_conv_to_image_classification_such_as_imagenet

    hi,Thanks for your sharing!

    The dilation is used for dense prediction,such as Semantic Segmentation. I have a naive idea, can we apply the dilation conv to the image classification task,such as imagenet?

    Do you know the work about this? Do you think the dilation will work for image classification?Is it worthy trying?

    Thanks for your kindly help and nice work!

    opened by liu666666 1
  • What to put inside of training/testing image/label text files?

    What to put inside of training/testing image/label text files?

    I'm training for my own dataset, but not quite sure what to put in training/testing image/label text files. As far as I understood, the contents as follows: train_image: <the list of paths of the original images> train_label: <the list of paths of the images that is inversed in black and white where I want them to detect as the area (the correct, expected result)> test_image: <the list of paths of the images I want to test> test_label: <?>

    What to put in the test_label? Also, please correct me if I'm wrong.

    opened by Itaru7 0
  • Relu problem

    Relu problem

    I download the CityScapesDataset dataset and run the code but I get an error

    in init pretrained=pretrained, num_classes=1000) in drn_c_26 model = DRN(BasicBlock, [1, 1, 2, 2, 2, 2, 1, 1], arch='C', **kwargs) in init self.relu = nn.ReLU(inplace=False) init_ super(ReLU, self).init(0, 0, inplace)

    TypeError: super(type, obj): obj must be an instance or subtype of type

    opened by fahmanali 0
  • How much GPU Memory is required/recommended to run the demo?

    How much GPU Memory is required/recommended to run the demo?

    How much GPU memory is required/recommended to run the demo? I am trying to run the demo on my Nvidia Jetson TX1 with 4 GB of RAM and the program terminates ("Killed") and/or reboots the machine.

    cuda 8 cudnn 6 ubuntu 16

    #1: python predict.py pascal_voc images/dog.jpg --gpu 0

    #2: python predict.py kitti images/example_kitti.png --gpu 0

    nvidia@tegra-ubuntu:~/cviz/dilation$ python predict.py kitti images/example_kitti.png --gpu 0 I0213 00:39:05.451731 2439 gpu_memory.cpp:159] GPUMemory::Manager initialized with Caching (CUB) GPU Allocator I0213 00:39:05.451974 2439 gpu_memory.cpp:161] Total memory: 4174815232, Free: 1888354304, dev_info[0]: total=4174815232 free=1888354304 I0213 00:39:05.452177 2439 gpu_memory.cpp:159] GPUMemory::Manager initialized with Caching (CUB) GPU Allocator I0213 00:39:05.452195 2439 gpu_memory.cpp:161] Total memory: 4174815232, Free: 1888354304, dev_info[0]: total=4174815232 free=1888354304 Using GPU 0 I0213 00:39:05.463042 2439 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: models/dilation7_kitti_deploy.prototxt I0213 00:39:05.463099 2439 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields. W0213 00:39:05.463116 2439 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields. I0213 00:39:05.463681 2439 net.cpp:70] Initializing net from parameters: state {

    opened by kaisark 0
  • CPU Training and Net architecture

    CPU Training and Net architecture

    -How can i launch the Context Train on CPU ? -And how can i launch my training on my edited network architecture ? As it seems for train.py file takes only the weights and i have to trick it by generating the arch weights on other module and input them here.

    opened by HamdiHamed1992 0
Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

The goal is to classify different birds species based on their songs/calls. Spectrograms have been extracted from the audio samples and used as features for classification.

Aditya Dutt 9 Dec 27, 2022
Dilated RNNs in pytorch

PyTorch Dilated Recurrent Neural Networks PyTorch implementation of Dilated Recurrent Neural Networks (DilatedRNN). Getting Started Installation: $ pi

Zalando Research 200 Nov 17, 2022
Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Stereo-Waterdrop-Removal-with-Row-wise-Dilated-Attention This repository includes official codes for "Stereo Waterdrop Removal with Row-wise Dilated A

null 29 Oct 1, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

null 0 Mar 1, 2022
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

FAU Implementation of the paper: Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution. Yingruo

Evelyn 78 Nov 29, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

null 97 Dec 17, 2022
Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

Tianfei Zhou 510 Jan 2, 2023
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

null 55 Nov 9, 2022
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
A unet implementation for Image semantic segmentation

Unet-pytorch a unet implementation for Image semantic segmentation 参考网上的Unet做分割的代码,做了一个针对kaggle地盐识别的,请去以下地址获取数据集: https://www.kaggle.com/c/tgs-salt-id

Rabbit 3 Jun 29, 2022
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 4, 2023
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022
Code for Mesh Convolution Using a Learned Kernel Basis

Mesh Convolution This repository contains the implementation (in PyTorch) of the paper FULLY CONVOLUTIONAL MESH AUTOENCODER USING EFFICIENT SPATIALLY

Yi_Zhou 35 Jan 3, 2023
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

null 253 Dec 24, 2022