High-resolution networks and Segmentation Transformer for Semantic Segmentation

HRNet

Last update: Jan 7, 2023

Related tags

Deep Learning transformer segmentation high-resolution semantic-segmentation cityscapes pascal-context lip high-resolution-net hrnets segmentation-transformer

Overview

High-resolution networks and Segmentation Transformer for Semantic Segmentation

Branches

This is the implementation for HRNet + OCR.
The PyTroch 1.1 version ia available here.
The PyTroch 0.4.1 version is available here.

News

[2021/05/04] We rephrase the OCR approach as Segmentation Transformer pdf. We will provide the updated implementation soon.
[2021/02/16] Based on the PaddleClas ImageNet pretrained weights, we achieve 83.22% on Cityscapes val, 59.62% on PASCAL-Context val (new SOTA), 45.20% on COCO-Stuff val (new SOTA), 58.21% on LIP val and 47.98% on ADE20K val. Please checkout openseg.pytorch for more details.
[2020/08/16] MMSegmentation has supported our HRNet + OCR.
[2020/07/20] The researchers from AInnovation have achieved Rank#1 on ADE20K Leaderboard via training our HRNet + OCR with a semi-supervised learning scheme. More details are in their Technical Report.
[2020/07/09] Our paper is accepted by ECCV 2020: Object-Contextual Representations for Semantic Segmentation. Notably, the reseachers from Nvidia set a new state-of-the-art performance on Cityscapes leaderboard: 85.4% via combining our HRNet + OCR with a new hierarchical mult-scale attention scheme.
[2020/03/13] Our paper is accepted by TPAMI: Deep High-Resolution Representation Learning for Visual Recognition.
HRNet + OCR + SegFix: Rank #1 (84.5) in Cityscapes leaderboard. OCR: object contextual represenations pdf. HRNet + OCR is reproduced here.
Thanks Google and UIUC researchers. A modified HRNet combined with semantic and instance multi-scale context achieves SOTA panoptic segmentation result on the Mapillary Vista challenge. See the paper.
Small HRNet models for Cityscapes segmentation. Superior to MobileNetV2Plus ....
Rank #1 (83.7) in Cityscapes leaderboard. HRNet combined with an extension of object context
Pytorch-v1.1 and the official Sync-BN supported. We have reproduced the cityscapes results on the new codebase. Please check the pytorch-v1.1 branch.

Introduction

This is the official code of high-resolution representations for Semantic Segmentation. We augment the HRNet with a very simple segmentation head shown in the figure below. We aggregate the output representations at four different resolutions, and then use a 1x1 convolutions to fuse these representations. The output representations is fed into the classifier. We evaluate our methods on three datasets, Cityscapes, PASCAL-Context and LIP.

Besides, we further combine HRNet with Object Contextual Representation and achieve higher performance on the three datasets. The code of HRNet+OCR is contained in this branch. We illustrate the overall framework of OCR in the Figure and the equivalent Transformer pipelines:

Segmentation models

The models are initialized by the weights pretrained on the ImageNet. ''Paddle'' means the results are based on PaddleCls pretrained HRNet models. You can download the pretrained models from https://github.com/HRNet/HRNet-Image-Classification. Slightly different, we use align_corners = True for upsampling in HRNet.

Performance on the Cityscapes dataset. The models are trained and tested with the input size of 512x1024 and 1024x2048 respectively. If multi-scale testing is used, we adopt scales: 0.5,0.75,1.0,1.25,1.5,1.75.

model	Train Set	Test Set	OHEM	Multi-scale	Flip	mIoU	Link
HRNetV2-W48	Train	Val	No	No	No	80.9	Github/BaiduYun(Access Code:pmix)
HRNetV2-W48 + OCR	Train	Val	No	No	No	81.6	Github/BaiduYun(Access Code:fa6i)
HRNetV2-W48 + OCR	Train + Val	Test	No	Yes	Yes	82.3	Github/BaiduYun(Access Code:ycrk)
HRNetV2-W48 (Paddle)	Train	Val	No	No	No	81.6	---
HRNetV2-W48 + OCR (Paddle)	Train	Val	No	No	No	---	---
HRNetV2-W48 + OCR (Paddle)	Train + Val	Test	No	Yes	Yes	---	---

Performance on the LIP dataset. The models are trained and tested with the input size of 473x473.

model	OHEM	Multi-scale	Flip	mIoU	Link
HRNetV2-W48	No	No	Yes	55.83	Github/BaiduYun(Access Code:fahi)
HRNetV2-W48 + OCR	No	No	Yes	56.48	Github/BaiduYun(Access Code:xex2)
HRNetV2-W48 (Paddle)	No	No	Yes	---	---
HRNetV2-W48 + OCR (Paddle)	No	No	Yes	---	---

Note Currently we could only reproduce HRNet+OCR results on LIP dataset with PyTorch 0.4.1.

Performance on the PASCAL-Context dataset. The models are trained and tested with the input size of 520x520. If multi-scale testing is used, we adopt scales: 0.5,0.75,1.0,1.25,1.5,1.75,2.0 (the same as EncNet, DANet etc.).

model	num classes	OHEM	Multi-scale	Flip	mIoU	Link
HRNetV2-W48	59 classes	No	Yes	Yes	54.1	Github/BaiduYun(Access Code:wz6v)
HRNetV2-W48 + OCR	59 classes	No	Yes	Yes	56.2	Github/BaiduYun(Access Code:yyxh)
HRNetV2-W48	60 classes	No	Yes	Yes	48.3	OneDrive/BaiduYun(Access Code:9uf8)
HRNetV2-W48 + OCR	60 classes	No	Yes	Yes	50.1	Github/BaiduYun(Access Code:gtkb)
HRNetV2-W48 (Paddle)	59 classes	No	Yes	Yes	---	---
HRNetV2-W48 (Paddle)	60 classes	No	Yes	Yes	---	---
HRNetV2-W48 + OCR (Paddle)	59 classes	No	Yes	Yes	---	---
HRNetV2-W48 + OCR (Paddle)	60 classes	No	Yes	Yes	---	---

Performance on the COCO-Stuff dataset. The models are trained and tested with the input size of 520x520. If multi-scale testing is used, we adopt scales: 0.5,0.75,1.0,1.25,1.5,1.75,2.0 (the same as EncNet, DANet etc.).

model	OHEM	Multi-scale	Flip	mIoU	Link
HRNetV2-W48	Yes	No	No	36.2	Github/BaiduYun(Access Code:92gw)
HRNetV2-W48 + OCR	Yes	No	No	39.7	Github/BaiduYun(Access Code:sjc4)
HRNetV2-W48	Yes	Yes	Yes	37.9	Github/BaiduYun(Access Code:92gw)
HRNetV2-W48 + OCR	Yes	Yes	Yes	40.6	Github/BaiduYun(Access Code:sjc4)
HRNetV2-W48 (Paddle)	Yes	No	No	---	---
HRNetV2-W48 + OCR (Paddle)	Yes	No	No	---	---
HRNetV2-W48 (Paddle)	Yes	Yes	Yes	---	---
HRNetV2-W48 + OCR (Paddle)	Yes	Yes	Yes	---	---

Performance on the ADE20K dataset. The models are trained and tested with the input size of 520x520. If multi-scale testing is used, we adopt scales: 0.5,0.75,1.0,1.25,1.5,1.75,2.0 (the same as EncNet, DANet etc.).

model	OHEM	Multi-scale	Flip	mIoU	Link
HRNetV2-W48	Yes	No	No	43.1	Github/BaiduYun(Access Code:f6xf)
HRNetV2-W48 + OCR	Yes	No	No	44.5	Github/BaiduYun(Access Code:peg4)
HRNetV2-W48	Yes	Yes	Yes	44.2	Github/BaiduYun(Access Code:f6xf)
HRNetV2-W48 + OCR	Yes	Yes	Yes	45.5	Github/BaiduYun(Access Code:peg4)
HRNetV2-W48 (Paddle)	Yes	No	No	---	---
HRNetV2-W48 + OCR (Paddle)	Yes	No	No	---	---
HRNetV2-W48 (Paddle)	Yes	Yes	Yes	---	---
HRNetV2-W48 + OCR (Paddle)	Yes	Yes	Yes	---	---

Quick start

Install

For LIP dataset, install PyTorch=0.4.1 following the official instructions. For Cityscapes and PASCAL-Context, we use PyTorch=1.1.0.
git clone https://github.com/HRNet/HRNet-Semantic-Segmentation $SEG_ROOT
Install dependencies: pip install -r requirements.txt

If you want to train and evaluate our models on PASCAL-Context, you need to install details.

pip install git+https://github.com/zhanghang1989/detail-api.git#subdirectory=PythonAPI

Data preparation

You need to download the Cityscapes, LIP and PASCAL-Context datasets.

Your directory tree should be look like this:

$SEG_ROOT/data
├── cityscapes
│   ├── gtFine
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── leftImg8bit
│       ├── test
│       ├── train
│       └── val
├── lip
│   ├── TrainVal_images
│   │   ├── train_images
│   │   └── val_images
│   └── TrainVal_parsing_annotations
│       ├── train_segmentations
│       ├── train_segmentations_reversed
│       └── val_segmentations
├── pascal_ctx
│   ├── common
│   ├── PythonAPI
│   ├── res
│   └── VOCdevkit
│       └── VOC2010
├── cocostuff
│   ├── train
│   │   ├── image
│   │   └── label
│   └── val
│       ├── image
│       └── label
├── ade20k
│   ├── train
│   │   ├── image
│   │   └── label
│   └── val
│       ├── image
│       └── label
├── list
│   ├── cityscapes
│   │   ├── test.lst
│   │   ├── trainval.lst
│   │   └── val.lst
│   ├── lip
│   │   ├── testvalList.txt
│   │   ├── trainList.txt
│   │   └── valList.txt

Train and Test

PyTorch Version Differences

Note that the codebase supports both PyTorch 0.4.1 and 1.1.0, and they use different command for training. In the following context, we use $PY_CMD to denote different startup command.

# For PyTorch 0.4.1
PY_CMD="python"
# For PyTorch 1.1.0
PY_CMD="python -m torch.distributed.launch --nproc_per_node=4"

e.g., when training on Cityscapes, we use PyTorch 1.1.0. So the command

$PY_CMD tools/train.py --cfg experiments/cityscapes/seg_hrnet_ocr_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml

indicates

python -m torch.distributed.launch --nproc_per_node=4 tools/train.py --cfg experiments/cityscapes/seg_hrnet_ocr_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml

Training

Just specify the configuration file for tools/train.py.

For example, train the HRNet-W48 on Cityscapes with a batch size of 12 on 4 GPUs:

$PY_CMD tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml

For example, train the HRNet-W48 + OCR on Cityscapes with a batch size of 12 on 4 GPUs:

$PY_CMD tools/train.py --cfg experiments/cityscapes/seg_hrnet_ocr_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml

Note that we only reproduce HRNet+OCR on LIP dataset using PyTorch 0.4.1. So we recommend to use PyTorch 0.4.1 if you want to train on LIP dataset.

Testing

For example, evaluating HRNet+OCR on the Cityscapes validation set with multi-scale and flip testing:

python tools/test.py --cfg experiments/cityscapes/seg_hrnet_ocr_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml \
                     TEST.MODEL_FILE hrnet_ocr_cs_8162_torch11.pth \
                     TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75 \
                     TEST.FLIP_TEST True

Evaluating HRNet+OCR on the Cityscapes test set with multi-scale and flip testing:

python tools/test.py --cfg experiments/cityscapes/seg_hrnet_ocr_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml \
                     DATASET.TEST_SET list/cityscapes/test.lst \
                     TEST.MODEL_FILE hrnet_ocr_trainval_cs_8227_torch11.pth \
                     TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75 \
                     TEST.FLIP_TEST True

Evaluating HRNet+OCR on the PASCAL-Context validation set with multi-scale and flip testing:

python tools/test.py --cfg experiments/pascal_ctx/seg_hrnet_ocr_w48_cls59_520x520_sgd_lr1e-3_wd1e-4_bs_16_epoch200.yaml \
                     DATASET.TEST_SET testval \
                     TEST.MODEL_FILE hrnet_ocr_pascal_ctx_5618_torch11.pth \
                     TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75,2.0 \
                     TEST.FLIP_TEST True

Evaluating HRNet+OCR on the LIP validation set with flip testing:

python tools/test.py --cfg experiments/lip/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150.yaml \
                     DATASET.TEST_SET list/lip/testvalList.txt \
                     TEST.MODEL_FILE hrnet_ocr_lip_5648_torch04.pth \
                     TEST.FLIP_TEST True \
                     TEST.NUM_SAMPLES 0

Evaluating HRNet+OCR on the COCO-Stuff validation set with multi-scale and flip testing:

python tools/test.py --cfg experiments/cocostuff/seg_hrnet_ocr_w48_520x520_ohem_sgd_lr1e-3_wd1e-4_bs_16_epoch110.yaml \
                     DATASET.TEST_SET list/cocostuff/testval.lst \
                     TEST.MODEL_FILE hrnet_ocr_cocostuff_3965_torch04.pth \
                     TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75,2.0 \
                     TEST.MULTI_SCALE True TEST.FLIP_TEST True

Evaluating HRNet+OCR on the ADE20K validation set with multi-scale and flip testing:

python tools/test.py --cfg experiments/ade20k/seg_hrnet_ocr_w48_520x520_ohem_sgd_lr2e-2_wd1e-4_bs_16_epoch120.yaml \
                     DATASET.TEST_SET list/ade20k/testval.lst \
                     TEST.MODEL_FILE hrnet_ocr_ade20k_4451_torch04.pth \
                     TEST.SCALE_LIST 0.5,0.75,1.0,1.25,1.5,1.75,2.0 \
                     TEST.MULTI_SCALE True TEST.FLIP_TEST True

Other applications of HRNet

Citation

If you find this work or code is helpful in your research, please cite:

@inproceedings{SunXLW19,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
  booktitle={CVPR},
  year={2019}
}

@article{WangSCJDZLMTWLX19,
  title={Deep High-Resolution Representation Learning for Visual Recognition},
  author={Jingdong Wang and Ke Sun and Tianheng Cheng and 
          Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and 
          Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
  journal={TPAMI},
  year={2019}
}

@article{YuanCW19,
  title={Object-Contextual Representations for Semantic Segmentation},
  author={Yuhui Yuan and Xilin Chen and Jingdong Wang},
  booktitle={ECCV},
  year={2020}
}

Reference

[1] Deep High-Resolution Representation Learning for Visual Recognition. Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao. Accepted by TPAMI. download

[2] Object-Contextual Representations for Semantic Segmentation. Yuhui Yuan, Xilin Chen, Jingdong Wang. download

Acknowledgement

We adopt sync-bn implemented by InplaceABN for PyTorch 0.4.1 experiments and the official sync-bn provided by PyTorch for PyTorch 1.10 experiments.

We adopt data precosessing on the PASCAL-Context dataset, implemented by PASCAL API.

Comments

RuntimeError: Ninja is required to load C++ extensions
您好，首先我出现这样的问题： RuntimeError: Ninja is required to load C++ extensions 然后我pip install ninja成功以后又出现这样的问题： /usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py:118: UserWarning:

!! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 4.9 and above. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6 for instructions on how to install GCC 4.9 or higher. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler)) Traceback (most recent call last): File "tools/train.py", line 27, in import models File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/init.py", line 11, in import models.seg_hrnet File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/seg_hrnet.py", line 22, in from .sync_bn.inplace_abn.bn import InPlaceABNSync File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/init.py", line 1, in from .inplace_abn import bn File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in from .bn import ABN, InPlaceABN, InPlaceABNSync File "/data/HRNet-Semantic-Segmentation-master/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in from functions import * File "/data/HRNet-Semantic-Segmentation-master/lib/models/sync_bn/inplace_abn/functions.py", line 16, in extra_cuda_cflags=["--expt-extended-lambda"]) File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 514, in load with_cuda=with_cuda) File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 690, in _jit_compile return _import_module_from_library(name, build_directory) File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 773, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/usr/lib/python3.5/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic return _load(spec) ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

请问这个BN和pytorch是要同步编译吗？我的pytorch==0.4.1
opened by GuangyanZhang 27

ImportError: No module named 'inplace_abn'

I tried Image Segmentation using the " HRNetV2-W18-Small-v2 " small model with cityscape dataset.

I haveinstall all modules mentioned in requirement.txt file with the matching version of the modules. My config is as follows - python 3.6 cuda 9.2 ninja 1.8.2 pytorch 0.4.1

I had done the steps till data preparations and then I tried to train using following command,

python tools/train.py --cfg experiments/cityscapes/seg_hrnet_w18_small_v1_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484

I am getting the below error.

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler)) Traceback (most recent call last): 
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 759, in _build_extension_module ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory) 
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout 
File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 514, in load with_cuda=with_cuda) 
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 690, in _jit_compile return _import_module_from_library(name, build_directory) 
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 770, in _import_module_from_library file, path, description = imp.find_module(module_name, [path]) 
File "/usr/lib/python3.6/imp.py", line 297, in find_module raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'inplace_abn'

To solve this, I have tried with diff version that match with ninja and cuda, but no luck. Any help please!

opened by InternetMaster1 16

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
ninja is already installed, however, the error is still occured. /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

!! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!! WARNING !!

platform=sys.platform)) Traceback (most recent call last): File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 949, in _build_extension_module check=True) File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tools/train.py", line 27, in import models File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/init.py", line 11, in import models.seg_hrnet File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/seg_hrnet.py", line 22, in from .sync_bn.inplace_abn.bn import InPlaceABNSync File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/init.py", line 1, in from .inplace_abn import bn File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in from .bn import ABN, InPlaceABN, InPlaceABNSync File "/cluster/home/it_stu21/main/HRNet-Semantic/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in from functions import * File "/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/functions.py", line 16, in extra_cuda_cflags=["--expt-extended-lambda"]) File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load is_python_module) File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 813, in jit_compile with_cuda=with_cuda) File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 866, in write_ninja_file_and_build build_extension_module(name, build_directory, verbose) File "/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 962, in build_extension_module raise RuntimeError(message) RuntimeError: Error building extension 'inplace_abn': b'[1/4] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\nFAILED: inplace_abn_cpu.o \nc++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp: In function \xe2\x80\x98std::vectorat::Tensor backward_cpu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, float)\xe2\x80\x99:\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:82:41: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dweight = at::empty(z.type(), {0});\n ^\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:83:39: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dbias = at::empty(z.type(), {0});\n ^\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:89:29: error: could not convert \xe2\x80\x98{dx, dweight, dbias}\xe2\x80\x99 from \xe2\x80\x98\xe2\x80\x99 to \xe2\x80\x98std::vectorat::Tensor\xe2\x80\x99\n return {dx, dweight, dbias};\n ^\n[2/4] /cluster/apps/cuda/10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\nFAILED: inplace_abn_cuda.cuda.o \n/cluster/apps/cuda/10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(99): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(99): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(100): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(100): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(202): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(202): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(203): error: no suitable user-defined conversion from "at::DeprecatedTypeProperties" to "c10::IntArrayRef" exists\n\n/cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu(203): error: no instance of constructor "c10::TensorOptions::TensorOptions" matches the argument list\n argument types are: (int64_t)\n\n8 errors detected in the compilation of "/tmp/tmpxft_0002e7bc_00000000-6_inplace_abn_cuda.cpp1.ii".\n[3/4] c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/TH -isystem /cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/THC -isystem /cluster/apps/cuda/10.0/include -isystem /cluster/home/it_stu21/.conda/envs/mm/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn.cpp -o inplace_abn.o\nIn file included from /cluster/home/it_stu21/main/HRNet-Semantic/lib/models/sync_bn/inplace_abn/src/inplace_abn.cpp:1:0:\n/cluster/home/it_stu21/.conda/envs/mm/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]\n #warning \\n ^\nninja: build stopped: subcommand failed.\n'
opened by gazelxu 14
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'at::Error'

I meet an error and I really know how to solve this error! Help!!!!! Someone say,"May be your labels are out of n". But my labels is from 0 to n-1! And I need your help! Thanks!

/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [574,0,0] Assertion t >= 0 && t < n_classes failed. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered Traceback (most recent call last): File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 251, in main() File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 220, in main trainloader, optimizer, model, writer_dict) File "/home/cartur/HRNet-Semantic-Segmentation/tools/../lib/core/function.py", line 46, in train loss = ### losses.mean()# RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'at::Error' what(): CUDA error: invalid device pointer (CudaCachingDeleter at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCCachingAllocator.cpp:498) frame #0: THStorage_free + 0x44 (0x7fd7638cf314 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #1: THTensor_free + 0x2f (0x7fd76396ea1f in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #2: at::CUDAFloatTensor::~CUDAFloatTensor() + 0x9 (0x7fd7404d2a59 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #3: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x5d (0x7fd7656d1e7d in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #4: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #5: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #6: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #7: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #8: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python) frame #9: std::_Sp_counted_deleter<torch::autograd::PyFunction, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #10: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #11: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #12: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #13: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #14: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #15: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #16: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #17: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #18: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #19: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #20: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #21: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python) frame #22: std::_Sp_counted_deleter<torch::autograd::PyFunction, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #23: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #24: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #25: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #26: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #27: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #28: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #29: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #30: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #31: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #32: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #33: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #34: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python) frame #35: std::_Sp_counted_deleter<torch::autograd::PyFunction*, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #36: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #37: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #38: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #39: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #40: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #41: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #42: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #43: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #44: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #45: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #46: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #47: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #48: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #49: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #50: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #51: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #52: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #53: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #54: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #55: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #56: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #57: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #58: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #59: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #60: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #61: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #62: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #63: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

opened by YijianLiu 11
stuck during training

I download the pretrained_models and modified GPU setting from (0,1,2,3) to (0,) but the training process stuck at here

Total Parameters: 65,773,843

Total Multiply Adds (For Convolution and Linear Layers only): 174.0439453125 GFLOPs

Number of Layers Conv2d : 307 layers InPlaceABNSync : 306 layers ReLU : 269 layers Bottleneck : 4 layers BasicBlock : 104 layers HighResolutionModule : 8 layers`

any idea about how this happened?

opened by world4jason 10
Training on Custom Dataset with just Person class

I wish to train HRNetV2-W18-Small-v2 on a custom dataset with just person class from scratch. How to achieve this?

My dataset would be similar to Supervisely Person Dataset with just a single person class mask.

I do not wish to perform transfer learning on the existing pre-trained model containing Cityscapes training, as I am afraid that will decrease my model accuracy, plus that is based on multiple classes.

How to get a blank model file of HRNetV2-W18-Small-v2 for a single person class?

opened by InternetMaster1 9
Initialization of OCR layers

Hello,

I was wondering why the layers of the OCR module are excluded from the init_weights function and whats the proper why to initialize them. I am trying to do my own implementation of the OCR module but somehow struggle with that point. After training the weights of the OCR layer are relatively high (~1), which seems to lead to faulty results.

Regards

opened by carhartt21 8
struggle for ninja when running test.py

I met the issue and try to solve it but failed. Dose someone have experience? I followed https://github.com/HRNet/HRNet-Semantic-Segmentation/issues/33 https://github.com/HRNet/HRNet-Semantic-Segmentation/issues/25

jiapy@adminroot:~/workspace/tools/ninja/ninja-1.10.0$ source /home/jiapy/virtualEnv/py3.6torch1.1/bin/activate (py3.6torch1.1) jiapy@adminroot:~/workspace/tools/ninja/ninja-1.10.0$ ninja --version 1.10.0 (py3.6torch1.1) jiapy@adminroot:~/workspace/tools/ninja/ninja-1.10.0$ ninja -v ninja: no work to do. (py3.6torch1.1) jiapy@adminroot:~/workspace/tools/ninja/ninja-1.10.0$ sh /home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/test.sh Traceback (most recent call last): File "/home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 949, in _build_extension_module check=True) File "/usr/local/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/tools/test.py", line 25, in import models File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/tools/../lib/models/init.py", line 11, in import models.seg_hrnet File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/tools/../lib/models/seg_hrnet.py", line 22, in from .sync_bn.inplace_abn.bn import InPlaceABNSync File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/init.py", line 1, in from .inplace_abn import bn File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/init.py", line 1, in from .bn import ABN, InPlaceABN, InPlaceABNSync File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/tools/../lib/models/sync_bn/inplace_abn/bn.py", line 14, in from functions import * File "/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/functions.py", line 16, in extra_cuda_cflags=["--expt-extended-lambda"]) File "/home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load is_python_module) File "/home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 813, in jit_compile with_cuda=with_cuda) File "/home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 866, in write_ninja_file_and_build build_extension_module(name, build_directory, verbose) File "/home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 962, in build_extension_module raise RuntimeError(message) RuntimeError: Error building extension 'inplace_abn': b"[1/3] :/usr/local/cuda-10.0:/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/TH -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0:/usr/local/cuda-10.0/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\nFAILED: inplace_abn_cuda.cuda.o \n:/usr/local/cuda-10.0:/usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/TH -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0:/usr/local/cuda-10.0/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o\n/bin/sh: 1: :/usr/local/cuda-10.0:/usr/local/cuda-10.0/bin/nvcc: not found\n[2/3] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/TH -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0:/usr/local/cuda-10.0/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\nFAILED: inplace_abn_cpu.o \nc++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/TH -isystem /home/jiapy/virtualEnv/py3.6torch1.1/lib/python3.6/site-packages/torch/include/THC -isystem :/usr/local/cuda-10.0:/usr/local/cuda-10.0/include -isystem /home/jiapy/virtualEnv/py3.6torch1.1/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o\n/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp: In function \xe2\x80\x98std::vectorat::Tensor backward_cpu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, float)\xe2\x80\x99:\n/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:82:34: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dweight = at::empty(z.type(), {0});\n ^\n/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:83:32: error: could not convert \xe2\x80\x98z.at::Tensor::type()\xe2\x80\x99 from \xe2\x80\x98at::DeprecatedTypeProperties\xe2\x80\x99 to \xe2\x80\x98c10::IntArrayRef {aka c10::ArrayRef}\xe2\x80\x99\n auto dbias = at::empty(z.type(), {0});\n ^\n/home/jiapy/workspace/segmentation/HRNet-Semantic-Segmentation/lib/models/sync_bn/inplace_abn/src/inplace_abn_cpu.cpp:89:29: error: could not convert \xe2\x80\x98{dx, dweight, dbias}\xe2\x80\x99 from \xe2\x80\x98\xe2\x80\x99 to \xe2\x80\x98std::vectorat::Tensor\xe2\x80\x99\n return {dx, dweight, dbias};\n ^\nninja: build stopped: subcommand failed.\n"

opened by dzyjjpy 8
Unable to reproduce `seg_hrnet_w18_small_v1`
Thanks for 27488d4, the configuration file is very helpful. With that said, training on 4 GPUs as prescribed, I'm unable to reproduce Cityscapes validation accuracy of 70.3% (attained 65.21%) https://github.com/HRNet/HRNet-Semantic-Segmentation#small-models.

Is https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/experiments/cityscapes/seg_hrnet_w18_small_v1_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml verbatim the file used to produce 70.3% or does it need further hyperparameter tuning? (I'm on the pytorch-v1.1 branch.)

In case it's helpful (although I'm sure this isn't informative), here are the cIoUs for the w18-v1 retrained model:

Loss: 0.179, MeanIU: 0.6509, Best_mIoU: 0.6521 [0.97245895 0.79921705 0.8969752 0.43651182 0.47062117 0.56336364 0.57983322 0.68906234 0.91533262 0.60986547 0.93415257 0.74804671 0.46804914 0.91671634 0.4241423 0.58802203 0.24108752 0.41514963 0.69802723]
opened by alvinwan 7
transform the model into ScriptModules

when i transform the hrnet model into ScriptModules using the command "traced_script_module=torch.jit.trace(kp_model,example) traced_script_module.save("hrnet_model.pt")" ,the error "assert(isinstance(orig, torch.nn.Module)) AssertionError" occur .i find it is caused by the ,any suggestion

opened by toyal 7
need your help

@sunke123 Thank you very much for your work. I see your code has a greet performance in cityscapes. Could you please show us those result files which you submit to the sityscapes?

opened by sde123 7
where did you difined the variable "border_padding "?

Hi: I find in the file fuction.py, the function testval shows: image, label, _, name, *border_padding = batch

but i can not find the difined of border_padding, could you pls. tell me about it ? thanks so much!

opened by Yaoxingtian 0
Bug fix. Change softmax dim.

In line 64, Change the softmax dim from 2 to 1.
According to this line, probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw

In this code, the input dimension is [batch_size, num_class, fh*fw]. And the softmax dimension is 2, which means that the summation of the dimensions of the feature map (fh*fw) is one.

However, in my opinion, I thinke the softmax dimension should be 1 to make the summation of the dimension of the num_class (num_class) is one.

The corrected code is as follows： probs = F.softmax(self.scale * probs, dim=1)# batch x num_class x hw

By the way, I had report this to issue, but without answer. And I have a simple comparative experimental verification, the results show that dim1 can convergence faster, and get a better mIOU.

opened by yannqi 0
Bug report/ Help watnted:

In this line : https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/HRNet-OCR/lib/models/seg_hrnet_ocr.py#L64

Question: According to this line, probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw
In this code, the input dimension is [batch_size, num_class, fh*fw]. And the softmax dimension is 2, which means that the summation of the dimensions of the feature map (fh*fw) is one.

However, in my opinion, I thinke the softmax dimension should be 1 to make the summation of the dimension of the num_class (num_class) is one.

The corrected code is as follows： probs = F.softmax(self.scale * probs, dim=1)# batch x num_class x hw

opened by yannqi 0
Is there a version of HRNet called HRNetv2-W28?

I was going through this paper and I came across the model "HRNetv2-W28". I searched it on the internet but could not find it. Does anyone know about it?

opened by sakethbachu 1
Why doesn't just use the gtmask for soft_object_regions

I'm a little confused, since the paper says when use the gtmask is best, why not just use the gtmask for Object_region instead of using a learnable conv_block in the code?

opened by wcyjerry 0

High-resolution networks and Segmentation Transformer for Semantic Segmentation

Related tags

Overview

High-resolution networks and Segmentation Transformer for Semantic Segmentation

Branches

News

Introduction

Segmentation models

Quick start

Install

Data preparation

Train and Test

PyTorch Version Differences

Training

Testing

Other applications of HRNet

Citation

Reference

Acknowledgement

Comments

Owner

HRNet

Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

This is an official implementation of the High-Resolution Transformer for Dense Prediction.

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation