Pyramid Scene Parsing Network, CVPR2017.

Hengshuang Zhao

Last update: Jan 5, 2023

Related tags

Deep Learning PSPNet

Overview

Pyramid Scene Parsing Network

by Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, details are in project page.

Introduction

This repository is for 'Pyramid Scene Parsing Network', which ranked 1st place in ImageNet Scene Parsing Challenge 2016. The code is modified from Caffe version of DeepLab v2 and yjxiong for evaluation. We merge the batch normalization layer named 'bn_layer' in the former one into the later one while keep the original 'batch_norm_layer' in the later one unchanged for compatibility. The difference is that 'bn_layer' contains four parameters as 'slope,bias,mean,variance' while 'batch_norm_layer' contains two parameters as 'mean,variance'. Several evaluation code is borrowed from MIT Scene Parsing.

PyTorch Version

Highly optimized PyTorch codebases available for semantic segmentation in repo: semseg, including full training and testing codes for PSPNet and PSANet.

Installation

For installation, please follow the instructions of Caffe and DeepLab v2. To enable cuDNN for GPU acceleration, cuDNN v4 is needed. If you meet error related with 'matio', please download and install matio as required in 'DeepLab v2'.

The code has been tested successfully on Ubuntu 14.04 and 12.04 with CUDA 7.0.

Usage

Clone the repository:

git clone https://github.com/hszhao/PSPNet.git

Build Caffe and matcaffe:

cd $PSPNET_ROOT
cp Makefile.config.example Makefile.config
vim Makefile.config
make -j8 && make matcaffe

Evaluation:
- Evaluation code is in folder 'evaluation'.
- Download trained models and put them in folder 'evaluation/model':
  - pspnet50_ADE20K.caffemodel: GoogleDrive
  - pspnet101_VOC2012.caffemodel: GoogleDrive
  - pspnet101_cityscapes.caffemodel: GoogleDrive
- Modify the related paths in 'eval_all.m':
  - Mainly variables 'data_root' and 'eval_list', and your image list for evaluation should be similarity to that in folder 'evaluation/samplelist' if you use this evaluation code structure.
  - Matlab 'parfor' evaluation is used and the default GPUs are with ID [0:3]. Modify variable 'gpu_id_array' if needed. We assume that number of images can be divided by number of GPUs; if not, you can just pad your image list or switch to single GPU evaluation by set 'gpu_id_array' be length of one, and change 'parfor' to 'for' loop.
```
cd evaluation
vim eval_all.m
```
- Run the evaluation scripts:
```
./run.sh
```
Results:

Prediction results will show in folder 'evaluation/mc_result' and the expected scores are:

(single scale testing denotes as 'ss' and multiple scale testing denotes as 'ms')
- PSPNet50 on ADE20K valset (mIoU/pAcc): 41.68/80.04 (ss) and 42.78/80.76 (ms)
- PSPNet101 on VOC2012 testset (mIoU): 85.41 (ms)
- PSPNet101 on cityscapes valset (mIoU/pAcc): 79.70/96.38 (ss) and 80.91/96.59 (ms)
Demo video:

Video processed by PSPNet101 on cityscapes dataset:

Merge with colormap on side: Video1

Alpha blending with value as 0.5: Video2

Citation

If PSPNet is useful for your research, please consider citing:

@inproceedings{zhao2017pspnet,
  title={Pyramid Scene Parsing Network},
  author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
  booktitle={CVPR},
  year={2017}
}

Questions

Please contact '[email protected]'

Comments

matio.h could not found

Hi,

I came across a bug "matio.h could not found" while trying to make -j8. I also check the src/caffe/util/ and there is also no matio.h both in your version of caffe and BVLC version vaffe.

The following is Error Message:

CXX src/caffe/util/matio_io.cpp src/caffe/util/matio_io.cpp:10:19: 致命错误： matio.h：没有那个文件或目录编译中断。 Makefile:575: recipe for target '.build_release/src/caffe/util/matio_io.o' failed make: *** [.build_release/src/caffe/util/matio_io.o] Error 1

Best Wishes~

opened by zhengdixin 13

Can this version be compiled by CUDA 8.0?

Hi, I found that there are some errors when compiling with CUDA 8.0.

1 error detected in the compilation of "/tmp/tmpxft_00000a2e_00000000-5_domain_transform_forward_only_layer.cpp4.ii".
make: *** [.build_release/cuda/src/caffe/layers/domain_transform_forward_only_layer.o] Error 1
make: *** Waiting for unfinished jobs....

This problem can be also seen when compiling deeplab with CUDA 8.0.

Do you have any solution for this problem? Thank you.

opened by bearpaw 12

Define class (ColorMap) for my Data

Dear all.

I want to training PSPNet with my data, so where and how can i define class (label) (Ex: Define color RGB(123,123,123) is for clother, RGB( 200,123,150) is for Desk.....).

Thanks so much.

opened by ThienAnh 7
MATLAB crashes using cityscapes trained model

I have been trying to test pspnet101_VOC2012.caffemodel, pspnet101_cityscapes.caffemodel, and pspnet50_ADE20K.caffemodel with my set of input images of size 1280*600, Each time I run eval_all.m, matlab crashes with the error as show below

I don't understand what the actual problem is, I am running it on a GPU with configuration

have made my makefile.config with openmpi.

On trying to run it with pspnet101_VOC2012.caffemodel with a sample of images I am able to run it without crashing, but the result saved in /home/PSPNet/evaluation/mc_result/VOC2012/test/pspnet101_473/color shows images with all pixel values 11.

Please let me know, What is wrong, and guide me to successfully test PSPNet with my data set.

opened by uu-ujwalkumar06 6
Evaluation with VOC2012
Hi As per step 3. evaluation,

I downloaded VOC2012 dataset from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar.

updated the path in eval_all.m. (inside /evaluation/samplelist)

I also used the existing VOC2012_test.txt

While running the ./run.sh, I got the below error.

Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers. Error using importdata (line 137) Unable to open file.

Error in eval_sub (line 3) list = importdata(fullfile(data_root,eval_list));

Error in eval_all (line 71) parfor i = 1:gpu_num %change 'parfor' to 'for' if singe GPU testing is used

One thing I observed is whatever filenames mentioned in VOC2012_test.txt are missing inside path VOC2012/JPEGImages.

Ex: The first 5 lines of VOC2012_test.txt file are /JPEGImages/2008_000006.jpg /JPEGImages/2008_000011.jpg /JPEGImages/2008_000012.jpg /JPEGImages/2008_000018.jpg /JPEGImages/2008_000024.jpg

But the available files in VOC2012/JPEGImages are like /JPEGImages/2008_000002.jpg /JPEGImages/2008_000003.jpg /JPEGImages/2008_000007.jpg /JPEGImages/2008_000008.jpg /JPEGImages/2008_000009.jpg

Why the filenames like 6, 11, 12, 18, 24, ... are missing inside VOC2012/JPEGImages is unclear. Do I need to download VOC2012 from some other place?, please let me know.

FYI, I tried downloading ADE20K, but the folder structures are totally different & don't know how to modify ADE20K_val.txt (big manual work) & gave up.
opened by gopi77 5
How to make it compatible with CUDNN v5

When I try to find why I cannot compile PSPNet with CUDNN v5 successfully, I find that the author is changing the README to say it is compatible with CUDNN v4... So, can anybody give me a hint how to modify the code to make it compatible with CUDNN v5?

opened by zhixy 4

runtest failed

...
./include/caffe/test/test_gradient_check_util.hpp:175: Failure
The difference between computed_gradient and estimated_gradient is 0.033336639404296875, which exceeds threshold_ * scale, where
computed_gradient evaluates to 1.9999799728393555,
estimated_gradient evaluates to 1.9666433334350586, and
threshold_ * scale evaluates to 0.00019999798678327352.
debug: (top_id, top_data_id, blob_id, feat_id)=0,119,3,119; feat = 0.96758121252059937; objective+ = 2.1408424377441406; objective- = 2.1015095710754395
...
...
[  FAILED  ] BatchNormLayerTest/2.TestGradient, where TypeParam = caffe::GPUDevice<float> (3116 ms)
[ RUN      ] BatchNormLayerTest/2.TestForward
src/caffe/test/test_batch_norm_layer.cpp:75: Failure
The difference between 1 and var is 111861.75, which exceeds kErrorBound, where
1 evaluates to 1,
var evaluates to 111862.75, and
kErrorBound evaluates to 0.0010000000474974513.
src/caffe/test/test_batch_norm_layer.cpp:75: Failure
The difference between 1 and var is 118616.2578125, which exceeds kErrorBound, where
1 evaluates to 1,
var evaluates to 118617.2578125, and
kErrorBound evaluates to 0.0010000000474974513.
[  FAILED  ] BatchNormLayerTest/2.TestForward, where TypeParam = caffe::GPUDevice<float> (2 ms)
[ RUN      ] BatchNormLayerTest/2.TestForwardInplace
src/caffe/test/test_batch_norm_layer.cpp:119: Failure
The difference between 1 and var is 114000.234375, which exceeds kErrorBound, where
1 evaluates to 1,
var evaluates to 114001.234375, and
kErrorBound evaluates to 0.0010000000474974513.
src/caffe/test/test_batch_norm_layer.cpp:119: Failure
The difference between 1 and var is 91060.625, which exceeds kErrorBound, where
1 evaluates to 1,
var evaluates to 91061.625, and
kErrorBound evaluates to 0.0010000000474974513.
[  FAILED  ] BatchNormLayerTest/2.TestForwardInplace, where TypeParam = caffe::GPUDevice<float> (1 ms)
[----------] 3 tests from BatchNormLayerTest/2 (3119 ms total)

[----------] 8 tests from SliceLayerTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] SliceLayerTest/3.TestSliceAcrossChannels
[       OK ] SliceLayerTest/3.TestSliceAcrossChannels (2 ms)
...
...
[       OK ] ContrastiveLossLayerTest/0.TestGradientLegacy (127 ms)
[----------] 4 tests from ContrastiveLossLayerTest/0 (266 ms total)

[----------] 1 test from LayerFactoryTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] LayerFactoryTest/0.TestCreateLayer
*** Aborted at 1489289221 (unix time) try "date -d @1489289221" if you are using GNU date ***
PC: @     0x7fd3894b5962 cfree
*** SIGSEGV (@0x1f9) received by PID 11176 (TID 0x7fd393215740) from PID 505; stack trace: ***
    @     0x7fd38980c390 (unknown)
    @     0x7fd3894b5962 cfree
    @     0x7fd38a1302e1 deallocate()
    @     0x7fd38a18c0e0 caffe::DenseCRFLayer<>::DeAllocateAllData()
    @     0x7fd38a190e66 caffe::DenseCRFLayer<>::~DenseCRFLayer()
    @     0x7fd38a1914a9 caffe::DenseCRFLayer<>::~DenseCRFLayer()
    @           0x71fa18 caffe::LayerFactoryTest_TestCreateLayer_Test<>::TestBody()
    @           0x8b5e43 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x8af45a testing::Test::Run()
    @           0x8af5a8 testing::TestInfo::Run()
    @           0x8af685 testing::TestCase::Run()
    @           0x8b095f testing::internal::UnitTestImpl::RunAllTests()
    @           0x8b0c83 testing::UnitTest::Run()
    @           0x46645d main
    @     0x7fd389452830 __libc_start_main
    @           0x46d7e9 _start
    @                0x0 (unknown)
Makefile:526: recipe for target 'runtest' failed
make: *** [runtest] Segmentation fault

Please help！ CUDA8,cuDNN5,ubuntu16,GTX950m

opened by lawpdas 3

error: function "atomicAdd(double *, double)" has already been defined

hi @hszhao, after i do cp Makefile.config.example Makefile.config, and then make -j9, i met this problem: ./include/caffe/common.cuh(9): error: function "atomicAdd(double *, double)" has already been defined. how to fix it? thanks.

opened by zimenglan-sysu-512 3

Caffe make error

Getting the following error when compiling Caffe with make:

./include/caffe/common.cuh(9): error: function "atomicAdd(double *, double)" has already been defined

1 error detected in the compilation of "/tmp/tmpxft_00003eec_00000000-5_domain_transform_layer.cpp4.ii".

Any solutions?

opened by BAILOOL 2

libmatio.so.2: cannot open shared object file

I downloaded and install matio by following commands:

./configure
make
sudo make install

I also changed the MATLAB_DIR := /usr/local/MATLAB/R2014b in Makefile.config file. Then I compile the PSPnet using

make -j8 && make matcaffe

I got the success message

MEX matlab/+caffe/private/caffe_.cpp
Building with 'g++'.
Warning: You are using gcc version '5.4.1'. The version of gcc is not supported. The version currently supported with MEX is '4.7.x'. For a list of currently supported compilers see: http://www.mathworks.com/support/compilers/current_release.
MEX completed successfully.

However, when I run the eval_all.m file, I got the error

Invalid MEX-file '/home/john/PSPNet/matlab/+caffe/private/caffe_.mexa64': libmatio.so.2: cannot open shared object file: No such file or directory
Error in caffe.reset_all (line 5)
caffe_('reset');
Error in eval_sub (line 20)
caffe.reset_all();
Error in eval_all (line 72)  eval_sub(data_name,data_root,eval_list,model_weights,model_deploy,fea_cha,base_size,crop_size,data_class,data_colormap, ..
```

How can I fix it? Thank you

opened by mjohn123 2

[FIXED] Why are scale_factors used to scale pixel values?

I read the function TransformImgAndSeg, which has the following codes.

// perform scaling
if (scale_factors_.size() > 0) {
  int scale_ind = Rand(scale_factors_.size());
  Dtype scale   = scale_factors_[scale_ind];
  
  if (scale != 1) {
    img_height *= scale;
    img_width  *= scale;
    ...

scale is a random scale_factor defined in transform_param, used to scale the size of the image. This makes perfect sense.

However, I noticed that the scale is also used to scale the pixel values in the same function.

if (has_mean_file) {
  int mean_index = (c * img_height + h_off + h) * img_width + w_off + w;
  transformed_data[top_index] =
    (pixel - mean[mean_index]) * scale;
} else {
    if (has_mean_values) {
      transformed_data[top_index] =
        (pixel - mean_values_[c]) * scale;
    } else {
      transformed_data[top_index] = pixel * scale;
    }
}

I think we should scale the pixel values using the scale in transform_param instead of this random scale_factor, right?

opened by jianchao-li 1

Reason to have a fixed inference size (473x473)

Hi, Thank you for sharing the code and trained models! I have a question specific to the demo in your PyTorch implementation. As I understand, you are using a base_size = 512 to which any incoming image is resized regardless of the input dimensions while maintaining the original aspect ratio. Then the inference is run in a grid fashion on crops of 473x473.

My question is to why have a fixed crop_size or a base_size? There is no fully connected layer in the architecture so why we have this fixed size? Is this an arbitrary choice for numbers or there is a solid reason for it?

Thank you for your time! Best, Touqeer

opened by TouqeerAhmad 1
Questions about PSPNet.
Thank you for uploading your code. It is very helpful to understand PSPNet. I have two questions about your paper.

You wrote

we use a pretrained ResNet model with the dilated network strategy to extract the feature map. The final feature map size is 1/8 of the input image.

in the paper. But I think the feature map size is 1/16 when you use ResNet50. Do you use only first 3 blocks of ResNet50?

You wrote

Then we directly upsample the low-dimension feature maps to get the same size feature as the original feature map via bilinear interpolation. Finally, different levels of features are concatenated as the final pyramid pooling global feature.

in Section 3.2 in the paper. I understand we have to concatenate resized different levels of features and feature map extracted by ResNet 50. But after that, the image size is 1/8 of the input image. How did you resize them to the same image size as input image?
opened by kazucmpt 6
math_functions.cu:375 [Check failed: status== CURAND_STATUS_SUCCESS (201 VS. 0) CURAND_STATUS_LAUNCH_FAILURE]

math_functions.cu:375 [Check failed: status== CURAND_STATUS_SUCCESS (201 VS. 0) CURAND_STATUS_LAUNCH_FAILURE] so I can not train the model. Can you help me? Looking forward to your reply!

opened by Alice-kenan 0
Problem with evaluation

Hi,

Thanks for your work. is it possible to evaluate your result with cityscapes evaluation tool? They have some requirements on the value for every pixel.

Thanks!

opened by YuShen1116 0

Owner

Hengshuang Zhao

GitHub https://hszhao.github.io/projects/pspnet

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

11 Oct 14, 2022

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

4.5k Jan 8, 2023

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

45 Nov 29, 2022

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

4.5k Jan 8, 2023

Development kit for MIT Scene Parsing Benchmark

Development Kit for MIT Scene Parsing Benchmark [NEW!] Our PyTorch implementation is released in the following repository: https://github.com/hangzhao

424 Dec 1, 2022

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

182 Dec 30, 2022

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

33 Dec 12, 2022

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

151 Dec 26, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

Pyramid Scene Parsing Network, CVPR2017.

Related tags

Overview

Pyramid Scene Parsing Network

Introduction

PyTorch Version

Installation

Usage

Citation

Questions

Comments

Owner

Hengshuang Zhao

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Development kit for MIT Scene Parsing Benchmark

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)