Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Zifeng Wu

Last update: Dec 12, 2022

Related tags

Overview

ademxapp

Visual applications by the University of Adelaide

In designing our Model A, we did not over-optimize its structure for efficiency unless it was neccessary, which led us to a high-performance model without non-trivial building blocks. Besides, by doing so, we anticipate this model and its trivial variants to perform well when they are finetuned for new tasks, considering their better spatial efficiency and larger model sizes compared to conventional ResNet models.

In this work, we try to find a proper depth for ResNets, without grid-searching the whole space, especially when it is too costly to do so, e.g., on the ILSVRC 2012 classification dataset. For more details, refer to our report: Wider or Deeper: Revisiting the ResNet Model for Visual Recognition.

This code is a refactored version of the one that we used in the competition, and has not yet been tested extensively, so feel free to open an issue if you find any problem.

To use, first install MXNet.

Updates

Recent updates
- Model A1 trained on Cityscapes
- Model A1 trained on VOC
- Training code for semantic image segmentation
- Training code for image classification on ILSVRC 2012 (Still needs to be evaluated.)

History
- Results on VOC using COCO for pre-training
- Fix the bug in testing resulted from changing the EPS in BatchNorm layers
- Model A1 for ADE20K trained using the train set with testing code
- Segmentation results with multi-scale testing on VOC and Cityscapes
- Model A and Model A1 for ILSVRC with testing code
- Segmentation results with single-scale testing on VOC and Cityscapes

Image classification

Pre-trained models

Download the ILSVRC 2012 classification val set 6.3GB, and put the extracted images into the directory:
```
data/ilsvrc12/ILSVRC2012_val/
```
Download the models as below, and put them into the directory:
```
models/
```

Check the classification performance of pre-trained models on the ILSVRC 2012 val set:

python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights models/ilsvrc-cls_rna-a_cls1000_ep-0001.params --split val --test-scales 320 --gpus 0 --no-choose-interp-method --pool-top-infer-style caffe

python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --split val --test-scales 320 --gpus 0 --no-choose-interp-method

Results on the ILSVRC 2012 val set tested with a single scale (320, without flipping):

model|top-1 error (%)|top-5 error (%)|download
:---:|:---:|:---:|:---:
[Model A](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ilsvrc_model_a.pdf)|19.20|4.73|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/V7dncO4H0ijzeRj)
[Model A1](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ilsvrc_model_a1.pdf)|19.54|4.75|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/NOPhJ247fhVDnZH)

Note: Due to a change of MXNet in padding at pooling layers, some of the computed feature maps in Model A will have different sizes from those stated in our report. However, this has no effect on Model A1, which always uses convolution layers (instead of pooling layers) for down-sampling. So, in most cases, just use Model A1, which was initialized from Model A, and tuned for 45k extra iterations.

New models

Find a machine with 4 devices, each with at least 11G memories.
Download the ILSVRC 2012 classification train set 138GB, and put the extracted images into the directory:
```
data/ilsvrc12/ILSVRC2012_train/
```
with the following structure:
```
ILSVRC2012_train
|-- n01440764
|-- n01443537
|-- ...
`-- n15075141
```

Train a new Model A from scratch, and check its performance:

python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a_cls1000 --batch-images 256 --crop-size 224 --lr-type linear --base-lr 0.1 --to-epoch 90 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror

python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/ilsvrc-cls_rna-a_cls1000_ep-0090.params --split val --test-scales 320 --gpus 0

Tune a Model A1 from our released Model A, and check its performance:

python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a1_cls1000_from-a --batch-images 256 --crop-size 224 --weights models/ilsvrc-cls_rna-a_cls1000_ep-0001.params --lr-type linear --base-lr 0.01 --to-epoch 9 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror

python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/model ilsvrc-cls_rna-a1_cls1000_from-a_ep-0009.params --split val --test-scales 320 --gpus 0

Or train a new Model A1 from scratch, and check its performance:

python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a1_cls1000 --batch-images 256 --crop-size 224 --lr-type linear --base-lr 0.1 --to-epoch 90 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror

python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/ilsvrc-cls_rna-a1_cls1000_ep-0090.params --split val --test-scales 320 --gpus 0

It cost more than 40 days on our workstation with 4 Maxwell GTX Titan cards. So, be patient or try smaller models as described in our report.

Note: The best setting (prefetch-threads and prefetcher) for efficiency can vary depending on the circumstances (the provided CPUs, GPUs, and filesystem).

Note: This code may not accurately reproduce our reported results, since there are subtle differences in implementation, e.g., different cropping strategies, interpolation methods, and padding strategies.

Semantic image segmentation

We show the effectiveness of our models (as pre-trained features) by semantic image segmenatation using plain dilated FCNs initialized from our models. Several A1 models tuned on the train set of PASCAL VOC, Cityscapes and ADE20K are available.

To use, download and put them into the directory:
```
models/
```

PASCAL VOC 2012:

Download the PASCAL VOC 2012 dataset 2GB, and put the extracted images into the directory:
```
data/VOCdevkit/VOC2012
```
with the following structure:
```
VOC2012
|-- JPEGImages
|-- SegmentationClass
`-- ...
```

Check the performance of the pre-trained models:

python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0

python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_coco_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0

Results on the val set:

model|training data|testing scale|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
Model A1, 2 conv.|VOC; SBD|500|80.84|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/YqNptRcboMD44Kd)
Model A1, 2 conv.|VOC; SBD; COCO|500|82.86|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/JKWePbLPlpfRDW4)

Results on the test set:

model|training data|testing scale|mean IoU (%)
:---|:---:|:---:|:---:
Model A1, 2 conv.|VOC; SBD|500|[82.5](http://host.robots.ox.ac.uk:8080/anonymous/H0KLZK.html)
Model A1, 2 conv.|VOC; SBD|multiple|[83.1](http://host.robots.ox.ac.uk:8080/anonymous/BEWE9S.html)
Model A1, 2 conv.|VOC; SBD; COCO|multiple|[84.9](http://host.robots.ox.ac.uk:8080/anonymous/JU1PXP.html)

Cityscapes:

Download the Cityscapes dataset, and put the extracted images into the directory:
```
data/cityscapes
```
with the following structure:
```
cityscapes
|-- gtFine
`-- leftImg8bit
```

Clone the official Cityscapes toolkit:

git clone https://github.com/mcordts/cityscapesScripts.git data/cityscapesScripts

Check the performance of the pre-trained model:

python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights models/cityscapes_rna-a1_cls19_s8_ep-0001.params --split val --test-scales 2048 --test-flipping --gpus 0

Tune a Model A1, and check its performance:

python issegm/voc.py --gpus 0,1,2,3 --split train --data-root data/cityscapes --output output --model cityscapes_rna-a1_cls19_s8 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror

python issegm/voc.py --gpus 0,1,2,3 --split train --data-root data/cityscapes --output output --model cityscapes_rna-a1_cls19_s8_x1-140 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights output/cityscapes_rna-a1_cls19_s8_ep-0140.params --lr-type linear --base-lr 0.0008 --to-epoch 64 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror

python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights output/cityscapes_rna-a1_cls19_s8_x1-140_ep-0064.params --split val --test-scales 2048 --test-flipping --gpus 0

Results on the val set:

model|training data|testing scale|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
Model A1, 2 conv.|fine|1024x2048|78.08|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/2hbvpro6J4XKVIu)

Results on the test set:

model|training data|testing scale|class IoU (%)|class iIoU (%)| category IoU (%)| category iIoU(%)
:---|:---:|:---:|:---:|:---:|:---:|:---:
Model A2, 2 conv.|fine|1024x2048|78.4|59.1|90.9|81.1
Model A2, 2 conv.|fine|multiple|79.4|58.0|91.0|80.1
Model A2, 2 conv.|fine; coarse|1024x2048|79.9|59.7|91.2|80.8
Model A2, 2 conv.|fine; coarse|multiple|80.6|57.8|91.0|79.1

For more information, refer to the official leaderboard.

Note: Model A2 was initialized from Model A, and tuned for 45k extra iterations using the Places data in ILSVRC 2016.

MIT Scene Parsing Benchmark (ADE20K):

Download the MIT Scene Parsing dataset, and put the extracted images into the directory:

data/ade20k/

with the following structure:

ade20k
|-- annotations
|   |-- training
|   `-- validation
`-- images
    |-- testing
    |-- training
    `-- validation

Check the performance of the pre-trained model:

python issegm/voc.py --data-root data/ade20k --output output --phase val --weights models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 500 --test-flipping --test-steps 2 --gpus 0

Results on the val set:

model|testing scale|pixel accuracy (%)|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
[Model A1, 2 conv.](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ade20k_model_a1.pdf)|500|80.55|43.34|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/E4JeZpmssK50kpn)

Citation

If you use this code or these models in your research, please cite:

@Misc{word.zifeng.2016,
    author = {Zifeng Wu and Chunhua Shen and Anton van den Hengel},
    title = {Wider or Deeper: {R}evisiting the ResNet Model for Visual Recognition},
    year = {2016}
    howpublished = {arXiv:1611.10080}
}

License

This code is only for academic purpose. For commercial purpose, please contact us.

Acknowledgement

This work is supported with supercomputing resources provided by the PSG cluster at NVIDIA and the Phoenix HPC service at the University of Adelaide.

Comments

How to run the semantic segmentation on my own images?

hi so I did succesfully run the perf test on Pascal VOC with: python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0`

Now I only want to use the pre-trained model and test it on non-VOC images, so on my own images, but it seems that voc.py is highly specific to the voc dataset.

Can I use voc.py and some flag to run it on my own images or do I need to heavily modify the code so that it does not look for segmentationClass mask images from voc etc?

thanks Tets

opened by Tetsujinfr 11
Cityscapes Models Missing

Hi!

I am trying to test your method trained on the Cityscapes Dataset in another dataset, but couldn't find your model to download. Is it available?

Thanks!

opened by FBernuy 8
Training with new database

Hi,

I want to train the pre-trained model with other databases.

I just want to train a model with below script.

python issegm/voc.py --gpus 0,1,2,3 --split train --data-root ${New_database} --output output --model ${New_database}_rna-a1_cls19 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

What parts of codes do I have to modify? It is not easy to understand whole codes.

And, Do I have to use the same sizes of crop and origin (-crop-size 500 --origin-size 2048) in order to use pretrained weight?

Could you please explain it for me?

Thanks.

opened by DonghyunK 7
Expected training time on cityscape dataset

I am trying to train the cityscape model

However, it looks like my training gets stucked. I am using 1X k40C 12 GB and 1X Titan X 12 GB. with batch size equal 200 and crop size 100. The last print from terminal was: 2017-02-10 00:27:06,021 Host Epoch[0] Batch [4] Speed: 24.24 samples/sec Train-fcn_valid=0.270395

Does anyone know the expected training time on this data?

opened by ChristianEschen 6
Run Errors, help

python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0

and I get error from util import transformer as ts because it is being executed from within the issegm/

then I add sys.path.append(os.getcwd()) at issegm/voc.py line 16

rerun 'python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0'

and I get Traceback (most recent call last): File "issegm/voc.py", line 578, in <module> _val_impl(args, model_specs, logger) File "issegm/voc.py", line 435, in _val_impl _, net_args, net_auxs = util.load_params(args.from_model, args.from_epoch) AttributeError: 'module' object has no attribute 'load_params'

so, what should I do next ? Thanks very much!

opened by izhangxm 4
Scale rates of multiscale test in cityscapes

When I use the given ResNet 38-A1 model pretrained on cityscapes and do single scale testing, I get the mIOU 78.08% at val dataset. As shown in the github page of ResNet 38, multiscale testing will boost the mIOU by about 1%. However, when I do multi-scale testing, (scales: 0.75, 0.875, 1) I get the mIOU 77.06%, that is lower than single scale testing. Could anyone tell me the scale rates for boosting the performance dramatically?

opened by yzou2 2
Errors in full image test on cityscapes val dataset

I use the following codes for segmentation cityscapes val dataset.

python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights models/cityscapes_rna-a1_cls19_s8_ep-0001.params --split val --test-scales 2048 --test-flipping --gpus 0

But I get the errors that are given below. However, if decrease the --test-scales to 1800, everything runs smoothly without troubles? I am sure it's not because of the GPU memory issues (I use a Titan X 12 Gb GPU). Any hints why this happens?

[22:28:43] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

[22:28:43] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [22:28:43] src/engine/./threaded_engine.h:329: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7f5dd525484f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

terminate called after throwing an instance of 'dmlc::Error' what(): [22:28:43] src/engine/./threaded_engine.h:329: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7f5dd525484f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

opened by yzou2 2
in the paper why the crop size is larger than the input size?

For networks trained with 224224 inputs, the testing crop size is 320320, following the setting used by He et al. [13]. For those with 112112 and 5656 inputs, we use 160160 and 8080 crops respectively.

as written is the paper,if we take 224*224 as input size,so why the testing crop size is 320*320(I think it should also be 224*224)?

opened by dongzhuoyao 1
the predicted image is all black using trained model on VOC

Hello, I used the following command to train the segmentation model on PASCAL VOC 2012 /home/server6/Segmentation/Resnet/ademxapp-master/venv/bin/python2.7 issegm/voc.py --gpu 0,1 --split train --data-root /home/server6/xly/SSENet_self_supervised --output output --model voc_rna-a1_cls21 --batch-image 4 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights /home/server6/xly/SSENet_self_supervised/weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror

but when I used the trained model to predict the image in val set, the result is all black. Can you give me some advice on it? thanks~

opened by czzerone 0
Training on VOC from Scratch
Hi,

I am attempting to train this network on VOC from scratch, essentially trying to recreate the pre-trained weights available for download; however, after 70+ epochs, my model is still just predicting background for an mIOU of 3.49%. Here is the command I am running to train:

python issegm/voc.py --gpus 1,2,3 --split train --data-root data/VOCdevkit/ --output train_out/ --model voc_rna-a1_cls21 --batch-images 12 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

Inside data/VOCdevkit/VOC2012 I have the original download of JPEGImages and SegmentationClass, which provides the full color segmentation images. Any help would be much appreciated.

Here's a snippet of output that may or may not help, showing fcn_valid moving a lot. I'm not entirely sure what the output means, so any explanation on what it is could be useful.

2019-04-11 15:00:09,073 Host Epoch[78] Batch [66-67] Speed: 11.93 samples/sec fcn_valid=0.623302 2019-04-11 15:00:10,056 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:10,058 Host Labels: 0 0.6 -1.0 Waited for 2.59876251221e-05 seconds 2019-04-11 15:00:10,075 Host Epoch[78] Batch [67-68] Speed: 11.98 samples/sec fcn_valid=0.644102 2019-04-11 15:00:10,076 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:11,055 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:11,056 Host Labels: 0 0.6 -1.0 Waited for 3.50475311279e-05 seconds 2019-04-11 15:00:11,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:11,077 Host Epoch[78] Batch [68-69] Speed: 11.98 samples/sec fcn_valid=0.632405 2019-04-11 15:00:12,056 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:12,058 Host Labels: 0 0.6 -1.0 Waited for 2.50339508057e-05 seconds 2019-04-11 15:00:12,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:12,077 Host Epoch[78] Batch [69-70] Speed: 12.00 samples/sec fcn_valid=0.775874 2019-04-11 15:00:13,057 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:13,058 Host Labels: 0 0.6 -1.0 Waited for 2.59876251221e-05 seconds 2019-04-11 15:00:13,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:13,077 Host Epoch[78] Batch [70-71] Speed: 12.01 samples/sec fcn_valid=0.562744 2019-04-11 15:00:14,056 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:14,058 Host Labels: 0 0.6 -1.0 Waited for 0.000184059143066 seconds 2019-04-11 15:00:14,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:14,075 Host Epoch[78] Batch [71-72] Speed: 12.03 samples/sec fcn_valid=0.552027
opened by mcever 2
pre-trained model in ade20k

Hi, I wanna know where can I download "ade20k_rna-a1_cls150_s8_ep-0001.params" ? I read through readme in detail, but still didn't find the link for it. can you give me the link?? thank you very much!!

opened by ghost 1
Where to get pretrained city scape model (cityscapes_rna-a1_cls19_s8_ep-0001.params)?

I can download the city scape dataset, but there's no model. The models directory is empty further, the cityscape website has no downloadable pre trained nets - just data. I open the checksum.md file and I see the following:

1faf29850bfa194678f0b8e1cbbffa98 ade20k_rna-a1_cls150_s8_ep-0001.params 226b3e861a6be7d0dc84e537f4eab154 cityscapes_rna-a1_cls19_s8_ep-0001.params ff21f45d6bf03284100dcbec571edfad ilsvrc-cls_rna-a1_cls1000_ep-0001.params 2421c1945b6797cecd3f89db14ca73f6 ilsvrc-cls_rna-a_cls1000_ep-0001.params 328c0eca0c45b6345ada2f95edce68d4 voc_rna-a1_cls21_s8_coco_ep-0001.params a34628a63d5f62dcb98c29c4e281f332 voc_rna-a1_cls21_s8_ep-0001.params

Where can I get these param files?

Thanks

opened by MontyTHall 4

Owner

Zifeng Wu

Postdoctoral researcher at the University of Adelaide

GitHub

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

80 Nov 20, 2022

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 3, 2022

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

342 Dec 16, 2022

《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

109 Dec 2, 2022

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

6.5k Jan 9, 2023

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

101 Dec 29, 2022

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Related tags

Overview

ademxapp

Updates

Image classification

Pre-trained models

New models

Semantic image segmentation

PASCAL VOC 2012:

Cityscapes:

MIT Scene Parsing Benchmark (ADE20K):

Citation

License

Acknowledgement

Comments

Owner

Zifeng Wu

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

《DeepViT: Towards Deeper Vision Transformer》(2021)

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Deeper DCGAN with AE stabilization

MogFace: Towards a Deeper Appreciation on Face Detection

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DeepLab resnet v2 model in pytorch

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

RE-OWOD - Revisiting open world object detection

3D ResNet Video Classification accelerated by TensorRT

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang