Dilated Convolution for Semantic Image Segmentation

Fisher Yu

Last update: Dec 26, 2022

Related tags

Deep Learning dilation

Overview

Multi-Scale Context Aggregation by Dilated Convolutions

Introduction

Properties of dilated convolution are discussed in our ICLR 2016 conference paper. This repository contains the network definitions and the trained models. You can use this code together with vanilla Caffe to segment images using the pre-trained models. If you want to train the models yourself, please check out the document for training.

If you are looking for dilation models with state-of-the-art performance and Python implementation, please check out Dilated Residual Networks.

Citing

If you find the code or the models useful, please cite this paper:

@inproceedings{YuKoltun2016,
	author    = {Fisher Yu and Vladlen Koltun},
	title     = {Multi-Scale Context Aggregation by Dilated Convolutions},
	booktitle = {ICLR},
	year      = {2016},
}

License

The code and models are released under the MIT License (refer to the LICENSE file for details).

Installation

Caffe

Install Caffe and its Python interface. Make sure that the Caffe version is newer than commit 08c5df.

Python

The companion Python script is used to demonstrate the network definition and trained weights.

The required Python packages are numba numpy opencv. Python release from Anaconda is recommended.

In the case of using Anaconda

conda install numba numpy opencv

Running Demo

predict.py is the main script to test the pre-trained models on images. The basic usage is

python predict.py <dataset name> <image path>

Given the dataset name, the script will find the pre-trained model and network definition. We currently support models trained from four datasets: pascal_voc, camvid, kitti, cityscapes. The steps of using the code is listed below:

Clone the code from Github

git clone [email protected]:fyu/dilation.git
cd dilation

Download pre-trained network
```
sh pretrained/download_pascal_voc.sh
```

Run pascal voc model on GPU 0

python predict.py pascal_voc images/dog.jpg --gpu 0

Training

You are more than welcome to train our model on a new dataset. To do that, please refer to the document for training.

Implementation of Dilated Convolution

Besides Caffe support, dilated convolution is also implemented in other deep learning packages. For example,

Torch: SpatialDilatedConvolution
Lasagne: DilatedConv2DLayer

Comments

ReadProtoFromBinaryFile problem

hello fyu, I am using code for some tests. I download the model following your instructions.But where I run the python predict.py images/dog.jpg --gpu 0 I always meet this problem.

upgrade_proto.cpp:86] Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: ./pretrained/dilated_convolution_context_coco.caffemodel

So, is there something wrong with you caffemodel? Or there exist some problem else?

Thank you~

opened by likesiwell 14
Check failed: outer_num_ * inner_num_ == bottom[1]->count()
I want to train the context module, but get the following error:

F0911 01:01:41.267956 15432 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (3276800 vs. 435600) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N_H_W, with integer values in {0, 1, ..., C-1}.

I followed the documentation "training.md". First, i had trained the front-end module and than generated the .bin files from test.py (and the feats.txt). I use the following to start the training:

python train.py context \ --train_image /home/timo/dilation/feat/train/feats.txt \ --train_label /home/timo/Cityscapes/gtFine/train/train_city_gt.txt \ --test_image /home/timo/dilation/feat/val/feats.txt \ --test_label /home/timo/Cityscapes/gtFine/val/val_city_gt.txt \ --train_batch 100 \ --test_batch 10 \ --caffe /home/timo/dilation/caffe-dilation/build_master_release/tools/caffe \ --classes 19 \ --layers 10 \ --label_shape 66 66 --lr 0.0001 --momentum 0.99

I am grateful for every tip.
opened by Timo-hab 6
there is no _caffe.so after make pycaffe

hi,

When I

make all make test make pycaffe

there is no _caffe.so in caffe-dilation/python/caffe

and I try the predict.py there is an error ImportError: libcaffe.so.1.0.0-rc3: cannot open shared object file: No such file or directory

I have already added the build_master/python into my PYTHONPATH

Does anyone know how to solve this problem?

Cheers, Mao

opened by maolin23 5
Get label images for evaluation

Hi Fisher

Thanks for that great repository! Could you still tell me, how to convert the color images to label images, where each pixel has an ID that represents the ground truth label? Is there already a script? Thanks in advance!

Best Timo

opened by Timo-hab 4
"cudaSuccess (2 vs. 0) out of memory" on GTX Titan

Hi Fisher, since great results have been reported by using Dilated CNN, I'm thinking to use it as the segmentation engine for my research. After reading your paper, today I started to play around with your code. Strangely, I kept getting the out of memory errors when tried the predict script. After checking the closed issues, I changed the Caffe back to the commit 08c5df but got the same errors.

I tried camvid, kitti and cityscapes. On camvid it worked, but not on the other two. Since GTX has 12GB memory, "out of memory" seems very weird to me. Is there any hint from your side?

Does anybody else get the similar errors?

cheers Rui

opened by rui2016 3
Training files

Thanks for sharing your pre-trained models. Could you also share your training prototxt files and the solver configuration, so that we can easily try to reproduce and/or use your architecture on other datasets ?

opened by nshaud 3
Upconvolution layer at the end of Dilated10 ?

Hi there,

First of all, great work ! :)

I noticed there is an upconvolution layer at the end of Dilated10, (see here).

Correct me if I'm wrong but I don't remember seeing this mentioned in the paper. Is this the model which led to the results presented in the paper ? Or perhaps it is a new one ? If so, could you kindly advise on whether you kept the same training procedure ? Thanks ! Cheers,

Pauline

opened by paulineluc 2
an error occurs when make pycaffe

the python code should import caffe but when I make pycaffe, an error occurs:

rsync -a --include '/' --include '.py' --exclude '*'
python/caffe/ build_master_release/python/caffe

What' s the problem? I cloned the caffe from your link fyu/caffe-dilation

opened by Engineering-Course 2
How can I train on my own dataset?

I want to train the model with my own dataset. I'm confused about the input and output forms of the image data. Could you help me? Could you please share the training prototxt files and the solver configuration? Thanks.

opened by Engineering-Course 2
how to set the dilation?

Hi, I want to use the ResNet50 with dilation, and I don't know which layer's dilation parameter should be added. Is there any suggestion for me?

Thanks.

opened by linquanxu 1
apply_dilaton_conv_to_image_classification_such_as_imagenet

hi，Thanks for your sharing!

The dilation is used for dense prediction,such as Semantic Segmentation. I have a naive idea, can we apply the dilation conv to the image classification task，such as imagenet?

Do you know the work about this? Do you think the dilation will work for image classification?Is it worthy trying?

Thanks for your kindly help and nice work!

opened by liu666666 1
What to put inside of training/testing image/label text files?

I'm training for my own dataset, but not quite sure what to put in training/testing image/label text files. As far as I understood, the contents as follows: train_image: <the list of paths of the original images> train_label: <the list of paths of the images that is inversed in black and white where I want them to detect as the area (the correct, expected result)> test_image: <the list of paths of the images I want to test> test_label: <?>

What to put in the test_label? Also, please correct me if I'm wrong.

opened by Itaru7 0
Relu problem

I download the CityScapesDataset dataset and run the code but I get an error

in init pretrained=pretrained, num_classes=1000) in drn_c_26 model = DRN(BasicBlock, [1, 1, 2, 2, 2, 2, 1, 1], arch='C', **kwargs) in init self.relu = nn.ReLU(inplace=False) init_ super(ReLU, self).init(0, 0, inplace)

TypeError: super(type, obj): obj must be an instance or subtype of type

opened by fahmanali 0
How much GPU Memory is required/recommended to run the demo?

How much GPU memory is required/recommended to run the demo? I am trying to run the demo on my Nvidia Jetson TX1 with 4 GB of RAM and the program terminates ("Killed") and/or reboots the machine.

cuda 8 cudnn 6 ubuntu 16

#1: python predict.py pascal_voc images/dog.jpg --gpu 0

#2: python predict.py kitti images/example_kitti.png --gpu 0

nvidia@tegra-ubuntu:~/cviz/dilation$ python predict.py kitti images/example_kitti.png --gpu 0 I0213 00:39:05.451731 2439 gpu_memory.cpp:159] GPUMemory::Manager initialized with Caching (CUB) GPU Allocator I0213 00:39:05.451974 2439 gpu_memory.cpp:161] Total memory: 4174815232, Free: 1888354304, dev_info[0]: total=4174815232 free=1888354304 I0213 00:39:05.452177 2439 gpu_memory.cpp:159] GPUMemory::Manager initialized with Caching (CUB) GPU Allocator I0213 00:39:05.452195 2439 gpu_memory.cpp:161] Total memory: 4174815232, Free: 1888354304, dev_info[0]: total=4174815232 free=1888354304 Using GPU 0 I0213 00:39:05.463042 2439 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: models/dilation7_kitti_deploy.prototxt I0213 00:39:05.463099 2439 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields. W0213 00:39:05.463116 2439 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields. I0213 00:39:05.463681 2439 net.cpp:70] Initializing net from parameters: state {

opened by kaisark 0
CPU Training and Net architecture

-How can i launch the Context Train on CPU ? -And how can i launch my training on my edited network architecture ? As it seems for train.py file takes only the weights and i have to trick it by generating the arch weights on other module and input them here.

opened by HamdiHamed1992 0

Owner

Fisher Yu

GitHub https://www.vis.xyz/pub/dilation

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

The goal is to classify different birds species based on their songs/calls. Spectrograms have been extracted from the audio samples and used as features for classification.

9 Dec 27, 2022

Dilated RNNs in pytorch

PyTorch Dilated Recurrent Neural Networks PyTorch implementation of Dilated Recurrent Neural Networks (DilatedRNN). Getting Started Installation: $ pi

200 Nov 17, 2022

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Stereo-Waterdrop-Removal-with-Row-wise-Dilated-Attention This repository includes official codes for "Stereo Waterdrop Removal with Row-wise Dilated A

29 Oct 1, 2022

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 1, 2022

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

FAU Implementation of the paper: Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution. Yingruo

78 Nov 29, 2022

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

134 Dec 19, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

97 Dec 17, 2022

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

510 Jan 2, 2023

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

48 Dec 18, 2022

A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

49 Sep 20, 2022

ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 9, 2022

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

54 Dec 12, 2022

Dilated Convolution for Semantic Image Segmentation

Related tags

Overview

Multi-Scale Context Aggregation by Dilated Convolutions

Introduction

Citing

License

Installation

Caffe

Python

Running Demo

Training

Implementation of Dilated Convolution

Comments

Owner

Fisher Yu

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

Dilated RNNs in pytorch

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

A Strong Baseline for Image Semantic Segmentation

ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

A unet implementation for Image semantic segmentation

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Code for Mesh Convolution Using a Learned Kernel Basis

Diverse Branch Block: Building a Convolution as an Inception-like Unit