Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Overview

ademxapp

Visual applications by the University of Adelaide

In designing our Model A, we did not over-optimize its structure for efficiency unless it was neccessary, which led us to a high-performance model without non-trivial building blocks. Besides, by doing so, we anticipate this model and its trivial variants to perform well when they are finetuned for new tasks, considering their better spatial efficiency and larger model sizes compared to conventional ResNet models.

In this work, we try to find a proper depth for ResNets, without grid-searching the whole space, especially when it is too costly to do so, e.g., on the ILSVRC 2012 classification dataset. For more details, refer to our report: Wider or Deeper: Revisiting the ResNet Model for Visual Recognition.

This code is a refactored version of the one that we used in the competition, and has not yet been tested extensively, so feel free to open an issue if you find any problem.

To use, first install MXNet.

Updates

  • Recent updates
    • Model A1 trained on Cityscapes
    • Model A1 trained on VOC
    • Training code for semantic image segmentation
    • Training code for image classification on ILSVRC 2012 (Still needs to be evaluated.)
  • History
    • Results on VOC using COCO for pre-training
    • Fix the bug in testing resulted from changing the EPS in BatchNorm layers
    • Model A1 for ADE20K trained using the train set with testing code
    • Segmentation results with multi-scale testing on VOC and Cityscapes
    • Model A and Model A1 for ILSVRC with testing code
    • Segmentation results with single-scale testing on VOC and Cityscapes

Image classification

Pre-trained models

  1. Download the ILSVRC 2012 classification val set 6.3GB, and put the extracted images into the directory:

    data/ilsvrc12/ILSVRC2012_val/
    
  2. Download the models as below, and put them into the directory:

    models/
    
  3. Check the classification performance of pre-trained models on the ILSVRC 2012 val set:

    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights models/ilsvrc-cls_rna-a_cls1000_ep-0001.params --split val --test-scales 320 --gpus 0 --no-choose-interp-method --pool-top-infer-style caffe
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --split val --test-scales 320 --gpus 0 --no-choose-interp-method

Results on the ILSVRC 2012 val set tested with a single scale (320, without flipping):

model|top-1 error (%)|top-5 error (%)|download
:---:|:---:|:---:|:---:
[Model A](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ilsvrc_model_a.pdf)|19.20|4.73|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/V7dncO4H0ijzeRj)
[Model A1](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ilsvrc_model_a1.pdf)|19.54|4.75|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/NOPhJ247fhVDnZH)

Note: Due to a change of MXNet in padding at pooling layers, some of the computed feature maps in Model A will have different sizes from those stated in our report. However, this has no effect on Model A1, which always uses convolution layers (instead of pooling layers) for down-sampling. So, in most cases, just use Model A1, which was initialized from Model A, and tuned for 45k extra iterations.

New models

  1. Find a machine with 4 devices, each with at least 11G memories.

  2. Download the ILSVRC 2012 classification train set 138GB, and put the extracted images into the directory:

    data/ilsvrc12/ILSVRC2012_train/
    

    with the following structure:

    ILSVRC2012_train
    |-- n01440764
    |-- n01443537
    |-- ...
    `-- n15075141
    
  3. Train a new Model A from scratch, and check its performance:

    python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a_cls1000 --batch-images 256 --crop-size 224 --lr-type linear --base-lr 0.1 --to-epoch 90 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/ilsvrc-cls_rna-a_cls1000_ep-0090.params --split val --test-scales 320 --gpus 0
  4. Tune a Model A1 from our released Model A, and check its performance:

    python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a1_cls1000_from-a --batch-images 256 --crop-size 224 --weights models/ilsvrc-cls_rna-a_cls1000_ep-0001.params --lr-type linear --base-lr 0.01 --to-epoch 9 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/model ilsvrc-cls_rna-a1_cls1000_from-a_ep-0009.params --split val --test-scales 320 --gpus 0
  5. Or train a new Model A1 from scratch, and check its performance:

    python iclass/ilsvrc.py --gpus 0,1,2,3 --data-root data/ilsvrc12 --output output --model ilsvrc-cls_rna-a1_cls1000 --batch-images 256 --crop-size 224 --lr-type linear --base-lr 0.1 --to-epoch 90 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror
    
    python iclass/ilsvrc.py --data-root data/ilsvrc12 --output output --batch-images 10 --phase val --weights output/ilsvrc-cls_rna-a1_cls1000_ep-0090.params --split val --test-scales 320 --gpus 0

It cost more than 40 days on our workstation with 4 Maxwell GTX Titan cards. So, be patient or try smaller models as described in our report.

Note: The best setting (prefetch-threads and prefetcher) for efficiency can vary depending on the circumstances (the provided CPUs, GPUs, and filesystem).

Note: This code may not accurately reproduce our reported results, since there are subtle differences in implementation, e.g., different cropping strategies, interpolation methods, and padding strategies.

Semantic image segmentation

We show the effectiveness of our models (as pre-trained features) by semantic image segmenatation using plain dilated FCNs initialized from our models. Several A1 models tuned on the train set of PASCAL VOC, Cityscapes and ADE20K are available.

  • To use, download and put them into the directory:

    models/
    

PASCAL VOC 2012:

  1. Download the PASCAL VOC 2012 dataset 2GB, and put the extracted images into the directory:

    data/VOCdevkit/VOC2012
    

    with the following structure:

    VOC2012
    |-- JPEGImages
    |-- SegmentationClass
    `-- ...
    
  2. Check the performance of the pre-trained models:

    python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0
    
    python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_coco_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0

Results on the val set:

model|training data|testing scale|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
Model A1, 2 conv.|VOC; SBD|500|80.84|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/YqNptRcboMD44Kd)
Model A1, 2 conv.|VOC; SBD; COCO|500|82.86|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/JKWePbLPlpfRDW4)

Results on the test set:

model|training data|testing scale|mean IoU (%)
:---|:---:|:---:|:---:
Model A1, 2 conv.|VOC; SBD|500|[82.5](http://host.robots.ox.ac.uk:8080/anonymous/H0KLZK.html)
Model A1, 2 conv.|VOC; SBD|multiple|[83.1](http://host.robots.ox.ac.uk:8080/anonymous/BEWE9S.html)
Model A1, 2 conv.|VOC; SBD; COCO|multiple|[84.9](http://host.robots.ox.ac.uk:8080/anonymous/JU1PXP.html)

Cityscapes:

  1. Download the Cityscapes dataset, and put the extracted images into the directory:

    data/cityscapes
    

    with the following structure:

    cityscapes
    |-- gtFine
    `-- leftImg8bit
    
  2. Clone the official Cityscapes toolkit:

    git clone https://github.com/mcordts/cityscapesScripts.git data/cityscapesScripts
  3. Check the performance of the pre-trained model:

    python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights models/cityscapes_rna-a1_cls19_s8_ep-0001.params --split val --test-scales 2048 --test-flipping --gpus 0
  4. Tune a Model A1, and check its performance:

    python issegm/voc.py --gpus 0,1,2,3 --split train --data-root data/cityscapes --output output --model cityscapes_rna-a1_cls19_s8 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror
    
    python issegm/voc.py --gpus 0,1,2,3 --split train --data-root data/cityscapes --output output --model cityscapes_rna-a1_cls19_s8_x1-140 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights output/cityscapes_rna-a1_cls19_s8_ep-0140.params --lr-type linear --base-lr 0.0008 --to-epoch 64 --kvstore local --prefetch-threads 8 --prefetcher process --cache-images 0 --backward-do-mirror
    
    python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights output/cityscapes_rna-a1_cls19_s8_x1-140_ep-0064.params --split val --test-scales 2048 --test-flipping --gpus 0

Results on the val set:

model|training data|testing scale|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
Model A1, 2 conv.|fine|1024x2048|78.08|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/2hbvpro6J4XKVIu)

Results on the test set:

model|training data|testing scale|class IoU (%)|class iIoU (%)| category IoU (%)| category iIoU(%)
:---|:---:|:---:|:---:|:---:|:---:|:---:
Model A2, 2 conv.|fine|1024x2048|78.4|59.1|90.9|81.1
Model A2, 2 conv.|fine|multiple|79.4|58.0|91.0|80.1
Model A2, 2 conv.|fine; coarse|1024x2048|79.9|59.7|91.2|80.8
Model A2, 2 conv.|fine; coarse|multiple|80.6|57.8|91.0|79.1

For more information, refer to the official leaderboard.

Note: Model A2 was initialized from Model A, and tuned for 45k extra iterations using the Places data in ILSVRC 2016.

MIT Scene Parsing Benchmark (ADE20K):

  1. Download the MIT Scene Parsing dataset, and put the extracted images into the directory:

    data/ade20k/
    

    with the following structure:

    ade20k
    |-- annotations
    |   |-- training
    |   `-- validation
    `-- images
        |-- testing
        |-- training
        `-- validation
    
  2. Check the performance of the pre-trained model:

    python issegm/voc.py --data-root data/ade20k --output output --phase val --weights models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 500 --test-flipping --test-steps 2 --gpus 0

Results on the val set:

model|testing scale|pixel accuracy (%)|mean IoU (%)|download
:---|:---:|:---:|:---:|:---:
[Model A1, 2 conv.](https://cdn.rawgit.com/itijyou/ademxapp/master/misc/ade20k_model_a1.pdf)|500|80.55|43.34|[aar](https://cloudstor.aarnet.edu.au/plus/index.php/s/E4JeZpmssK50kpn)

Citation

If you use this code or these models in your research, please cite:

@Misc{word.zifeng.2016,
    author = {Zifeng Wu and Chunhua Shen and Anton van den Hengel},
    title = {Wider or Deeper: {R}evisiting the ResNet Model for Visual Recognition},
    year = {2016}
    howpublished = {arXiv:1611.10080}
}

License

This code is only for academic purpose. For commercial purpose, please contact us.

Acknowledgement

This work is supported with supercomputing resources provided by the PSG cluster at NVIDIA and the Phoenix HPC service at the University of Adelaide.

Comments
  • How to run the semantic segmentation on my own images?

    How to run the semantic segmentation on my own images?

    hi so I did succesfully run the perf test on Pascal VOC with: python issegm/voc.py --data-root data/VOCdevkit --output output --phase val --weights models/voc_rna-a1_cls21_s8_ep-0001.params --split val --test-scales 500 --test-flipping --gpus 0`

    Now I only want to use the pre-trained model and test it on non-VOC images, so on my own images, but it seems that voc.py is highly specific to the voc dataset.

    Can I use voc.py and some flag to run it on my own images or do I need to heavily modify the code so that it does not look for segmentationClass mask images from voc etc?

    thanks Tets

    opened by Tetsujinfr 11
  • Cityscapes Models Missing

    Cityscapes Models Missing

    Hi!

    I am trying to test your method trained on the Cityscapes Dataset in another dataset, but couldn't find your model to download. Is it available?

    Thanks!

    opened by FBernuy 8
  • Training with new database

    Training with new database

    Hi,

    I want to train the pre-trained model with other databases.

    I just want to train a model with below script.

    python issegm/voc.py --gpus 0,1,2,3 --split train --data-root ${New_database} --output output --model ${New_database}_rna-a1_cls19 --batch-images 16 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights models/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror

    What parts of codes do I have to modify? It is not easy to understand whole codes.

    And, Do I have to use the same sizes of crop and origin (-crop-size 500 --origin-size 2048) in order to use pretrained weight?

    Could you please explain it for me?

    Thanks.

    opened by DonghyunK 7
  • Expected training time on cityscape dataset

    Expected training time on cityscape dataset

    I am trying to train the cityscape model

    However, it looks like my training gets stucked. I am using 1X k40C 12 GB and 1X Titan X 12 GB. with batch size equal 200 and crop size 100. The last print from terminal was: 2017-02-10 00:27:06,021 Host Epoch[0] Batch [4] Speed: 24.24 samples/sec Train-fcn_valid=0.270395

    Does anyone know the expected training time on this data?

    opened by ChristianEschen 6
  • Run Errors, help

    Run Errors, help

    python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0

    and I get error from util import transformer as ts because it is being executed from within the issegm/

    then I add sys.path.append(os.getcwd()) at issegm/voc.py line 16

    rerun 'python issegm/voc.py --data-root data/ade20k --output output --phase val --weight models/ade20k_rna-a1_cls150_s8_ep-0001.params --split val --test-scales 504 --test-flipping --test-steps 2 --gpus 0'

    and I get Traceback (most recent call last): File "issegm/voc.py", line 578, in <module> _val_impl(args, model_specs, logger) File "issegm/voc.py", line 435, in _val_impl _, net_args, net_auxs = util.load_params(args.from_model, args.from_epoch) AttributeError: 'module' object has no attribute 'load_params'

    so, what should I do next ? Thanks very much!

    opened by izhangxm 4
  • Scale rates of multiscale test in cityscapes

    Scale rates of multiscale test in cityscapes

    When I use the given ResNet 38-A1 model pretrained on cityscapes and do single scale testing, I get the mIOU 78.08% at val dataset. As shown in the github page of ResNet 38, multiscale testing will boost the mIOU by about 1%. However, when I do multi-scale testing, (scales: 0.75, 0.875, 1) I get the mIOU 77.06%, that is lower than single scale testing. Could anyone tell me the scale rates for boosting the performance dramatically?

    opened by yzou2 2
  • Errors in full image test on cityscapes val dataset

    Errors in full image test on cityscapes val dataset

    I use the following codes for segmentation cityscapes val dataset.

    python issegm/voc.py --data-root data/cityscapes --output output --phase val --weights models/cityscapes_rna-a1_cls19_s8_ep-0001.params --split val --test-scales 2048 --test-flipping --gpus 0

    But I get the errors that are given below. However, if decrease the --test-scales to 1800, everything runs smoothly without troubles? I am sure it's not because of the GPU memory issues (I use a Titan X 12 Gb GPU). Any hints why this happens?


    [22:28:43] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

    Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

    [22:28:43] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [22:28:43] src/engine/./threaded_engine.h:329: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

    Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

    An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

    Stack trace returned 6 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7f5dd525484f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

    terminate called after throwing an instance of 'dmlc::Error' what(): [22:28:43] src/engine/./threaded_engine.h:329: [22:28:43] src/operator/./cudnn_convolution-inl.h:572: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) cuDNN: CUDNN_STATUS_INTERNAL_ERROR

    Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1a64e8f) [0x7f5dd613ae8f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x21e123) [0x7f5dd48f4123] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7f5dd52545bc] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

    An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

    Stack trace returned 6 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7f5dd48610dc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7f5dd525484f] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0xb81590) [0x7f5dd5257590] [bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5de7e91c80] [bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7f5dee6566fa] [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5dee38cb5d]

    opened by yzou2 2
  • in the paper  why the crop size is larger than the input size?

    in the paper why the crop size is larger than the input size?

    For networks trained with 224224 inputs, the testing crop size is 320320, following the setting used by He et al. [13]. For those with 112112 and 5656 inputs, we use 160160 and 8080 crops respectively.

    as written is the paper,if we take 224*224 as input size,so why the testing crop size is 320*320(I think it should also be 224*224)?

    opened by dongzhuoyao 1
  • the predicted image is all black using trained model on VOC

    the predicted image is all black using trained model on VOC

    Hello, I used the following command to train the segmentation model on PASCAL VOC 2012 /home/server6/Segmentation/Resnet/ademxapp-master/venv/bin/python2.7 issegm/voc.py --gpu 0,1 --split train --data-root /home/server6/xly/SSENet_self_supervised --output output --model voc_rna-a1_cls21 --batch-image 4 --crop-size 500 --origin-size 2048 --scale-rate-range 0.7,1.3 --weights /home/server6/xly/SSENet_self_supervised/weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.params --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 8 --prefetcher process --backward-do-mirror

    but when I used the trained model to predict the image in val set, the result is all black. Can you give me some advice on it? thanks~

    opened by czzerone 0
  • Training on VOC from Scratch

    Training on VOC from Scratch

    Hi,

    I am attempting to train this network on VOC from scratch, essentially trying to recreate the pre-trained weights available for download; however, after 70+ epochs, my model is still just predicting background for an mIOU of 3.49%. Here is the command I am running to train:

    python issegm/voc.py --gpus 1,2,3 --split train --data-root data/VOCdevkit/ --output train_out/ --model voc_rna-a1_cls21 --batch-images 12 --crop-size
    500 --origin-size 2048 --scale-rate-range 0.7,1.3 --lr-type fixed --base-lr 0.0016 --to-epoch 140 --kvstore local --prefetch-threads 4 --prefetcher thread --backward-do-mirror
    

    Inside data/VOCdevkit/VOC2012 I have the original download of JPEGImages and SegmentationClass, which provides the full color segmentation images. Any help would be much appreciated.

    Here's a snippet of output that may or may not help, showing fcn_valid moving a lot. I'm not entirely sure what the output means, so any explanation on what it is could be useful.

    2019-04-11 15:00:09,073 Host Epoch[78] Batch [66-67] Speed: 11.93 samples/sec fcn_valid=0.623302 2019-04-11 15:00:10,056 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:10,058 Host Labels: 0 0.6 -1.0 Waited for 2.59876251221e-05 seconds 2019-04-11 15:00:10,075 Host Epoch[78] Batch [67-68] Speed: 11.98 samples/sec fcn_valid=0.644102 2019-04-11 15:00:10,076 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:11,055 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:11,056 Host Labels: 0 0.6 -1.0 Waited for 3.50475311279e-05 seconds 2019-04-11 15:00:11,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:11,077 Host Epoch[78] Batch [68-69] Speed: 11.98 samples/sec fcn_valid=0.632405 2019-04-11 15:00:12,056 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:12,058 Host Labels: 0 0.6 -1.0 Waited for 2.50339508057e-05 seconds 2019-04-11 15:00:12,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:12,077 Host Epoch[78] Batch [69-70] Speed: 12.00 samples/sec fcn_valid=0.775874 2019-04-11 15:00:13,057 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:13,058 Host Labels: 0 0.6 -1.0 Waited for 2.59876251221e-05 seconds 2019-04-11 15:00:13,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:13,077 Host Epoch[78] Batch [70-71] Speed: 12.01 samples/sec fcn_valid=0.562744 2019-04-11 15:00:14,056 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:14,058 Host Labels: 0 0.6 -1.0 Waited for 0.000184059143066 seconds 2019-04-11 15:00:14,074 Host Labels: 0 0.6 -1.0 2019-04-11 15:00:14,075 Host Epoch[78] Batch [71-72] Speed: 12.03 samples/sec fcn_valid=0.552027

    opened by mcever 2
  • pre-trained model in ade20k

    pre-trained model in ade20k

    Hi, I wanna know where can I download "ade20k_rna-a1_cls150_s8_ep-0001.params" ? I read through readme in detail, but still didn't find the link for it. can you give me the link?? thank you very much!!

    opened by ghost 1
  • Where to get pretrained city scape model (cityscapes_rna-a1_cls19_s8_ep-0001.params)?

    Where to get pretrained city scape model (cityscapes_rna-a1_cls19_s8_ep-0001.params)?

    I can download the city scape dataset, but there's no model. The models directory is empty further, the cityscape website has no downloadable pre trained nets - just data. I open the checksum.md file and I see the following:

    1faf29850bfa194678f0b8e1cbbffa98 ade20k_rna-a1_cls150_s8_ep-0001.params 226b3e861a6be7d0dc84e537f4eab154 cityscapes_rna-a1_cls19_s8_ep-0001.params ff21f45d6bf03284100dcbec571edfad ilsvrc-cls_rna-a1_cls1000_ep-0001.params 2421c1945b6797cecd3f89db14ca73f6 ilsvrc-cls_rna-a_cls1000_ep-0001.params 328c0eca0c45b6345ada2f95edce68d4 voc_rna-a1_cls21_s8_coco_ep-0001.params a34628a63d5f62dcb98c29c4e281f332 voc_rna-a1_cls21_s8_ep-0001.params

    Where can I get these param files?

    Thanks

    opened by MontyTHall 4
Owner
Zifeng Wu
Postdoctoral researcher at the University of Adelaide
Zifeng Wu
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

Wouter Van Gansbeke 80 Nov 20, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

null 5 Nov 3, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

null 109 Dec 2, 2022
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

Xuebin Qin 6.5k Jan 9, 2023
[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

VITA 101 Dec 29, 2022
Deeper DCGAN with AE stabilization

AEGeAN Deeper DCGAN with AE stabilization Parallel training of generative adversarial network as an autoencoder with dedicated losses for each stage.

Tyler Kvochick 36 Feb 17, 2022
MogFace: Towards a Deeper Appreciation on Face Detection

MogFace: Towards a Deeper Appreciation on Face Detection Introduction In this repo, we propose a promising face detector, termed as MogFace. Our MogFa

null 48 Dec 20, 2022
Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal

Wenguan Wang 126 Dec 3, 2022
DeepLab resnet v2 model in pytorch

pytorch-deeplab-resnet DeepLab resnet v2 model implementation in pytorch. The architecture of deepLab-ResNet has been replicated exactly as it is from

Isht Dwivedi 601 Dec 22, 2022
Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Twins: Revisiting the Design of Spatial Attention in Vision Transformers Very recently, a variety of vision transformer architectures for dense predic

null 482 Dec 18, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 4, 2023
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

revisiting-sepconv This is a reference implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation [1] using PyTorch. Given two f

Simon Niklaus 59 Dec 22, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) Overview Prerequisites Linux Pytho

Shaojie Li 34 Mar 31, 2022
RE-OWOD - Revisiting open world object detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our n

null 7 Jan 5, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

null 349 Dec 8, 2022