DeepLab2: A TensorFlow Library for Deep Labeling

Overview

DeepLab2: A TensorFlow Library for Deep Labeling

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks, including, but not limited to semantic segmentation, instance segmentation, panoptic segmentation, depth estimation, or even video panoptic segmentation.

Deep labeling refers to solving computer vision problems by assigning a predicted value for each pixel in an image with a deep neural network. As long as the problem of interest could be formulated in this way, DeepLab2 should serve the purpose. Additionally, this codebase includes our recent and state-of-the-art research models on deep labeling. We hope you will find it useful for your projects.

Installation

See Installation.

Dataset preparation

The dataset needs to be converted to TFRecord. We provide some examples below.

Some guidances about how to convert your own dataset.

Projects

We list a few projects that use DeepLab2.

Colab Demo

Running DeepLab2

See Getting Started. In short, run the following command:

To run DeepLab2 on GPUs, the following command should be used:

python training/train.py \
    --config_file=${CONFIG_FILE} \
    --mode={train | eval | train_and_eval | continuous_eval} \
    --model_dir=${BASE_MODEL_DIRECTORY} \
    --num_gpus=${NUM_GPUS}

Change logs

See Change logs for recent updates.

Contacts (Maintainers)

Please check FAQ if you have some questions before reporting the issues.

Disclaimer

  • Note that this library contains our re-implemented DeepLab models in TensorFlow2, and thus may have some minor differences from the published papers (e.g., learning rate).

  • This is not an official Google product.

Citing DeepLab2

If you find DeepLab2 useful for your project, please consider citing DeepLab2 along with the relevant DeepLab series.

  • DeepLab2:
@article{deeplab2_2021,
  author={Mark Weber and Huiyu Wang and Siyuan Qiao and Jun Xie and Maxwell D. Collins and Yukun Zhu and Liangzhe Yuan and Dahun Kim and Qihang Yu and Daniel Cremers and Laura Leal-Taixe and Alan L. Yuille and Florian Schroff and Hartwig Adam and Liang-Chieh Chen},
  title={{DeepLab2: A TensorFlow Library for Deep Labeling}},
  journal={arXiv: 2106.09748},
  year={2021}
}

References

  1. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. "The cityscapes dataset for semantic urban scene understanding." In CVPR, 2016.

  2. Andreas Geiger, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." In CVPR, 2012.

  3. Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. "Semantickitti: A dataset for semantic scene understanding of lidar sequences." In ICCV, 2019.

  4. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. "Panoptic segmentation." In CVPR, 2019.

  5. Dahun Kim, Sanghyun Woo, Joon-Young Lee, and In So Kweon. "Video panoptic segmentation." In CVPR, 2020.

  6. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. "Microsoft COCO: Common objects in context." In ECCV, 2014.

  7. Patrick Dendorfer, Aljosa Osep, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth, and Laura Leal-Taixe. "MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking." IJCV, 2020.

Comments
  • Cannot do evaluation on vip-deeplab

    Cannot do evaluation on vip-deeplab

    Hi, @joe-siyuan-qiao

    Thanks for your work on deeplab2. I am trying to use deeplab2 to run vip-deeplab. Although the training process is good, I met the following error when evaluating:

    E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: Invalid argument: Size of values 3 does not match size of permutation 4 @ fanin shape inViPDeepLab/PostProcessor/StatefulPartitionedCall/while/body/_166/while/SelectV2_1-1-TransposeNHWCToNCHW-LayoutOptimizer

    By the way, how many gpus should I use to reproduce the results in the paper (with the config in the "resnet50_beta_os32.textproto", batch size = 4)?

    Thanks.

    opened by HarborYuan 24
  • What is your environment for testing your model?

    What is your environment for testing your model?

    Dear authors: Hi! Thanks for opensourcing this repo. I meet several problems for runing this repo.

    I got stuck when performing evaluation process according to the issue. https://github.com/google-research/deeplab2/issues/58 image

    I make such change to run you model (). 9052315f1a86bd8764c659f0fc32726 I download the MotionDeeplab ckpt from this repo. I perform the evaluation process. but the results are nearly zero. image

    Has anyone successfully run this repo????

    I doubt it maybe enviroment problems.

    I use RTX-3090 with tf2.5 cuda 11.1.

    opened by lxtGH 11
  • Exported saved_model file performs bad comparing to eval on same images

    Exported saved_model file performs bad comparing to eval on same images

    Hi,

    I trained a two semantic segmentation "panoptic segmentation" models, one with a resnet_beta and one with swidernet backbones and I have the same issue with both. When I try to export the 60k checkpoint to a saved_model (If there's another format that might work better I'm happy to try it) the export completes but the models deliver inferior performance comparing with the model latest eval when using the same images. The general shape of the masks is reasonable, but there are some spots that are missing and at the some time some spots appear randomly in the image.

    I already tried disabling axial_use_recompute_grad, and that didn't help. Also following the process, there's a file in the saved_model folder and two in the variables folder, but none in the assets folder.

    I'm using Windows 11 and an RTX 3090.

    Here's the output during the export process:

    2022-07-12 16:09:39.451488: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-07-12 16:09:40.292519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21676 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:21:00.0, compute capability: 8.6 2022-07-12 16:09:40.293903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 3624 MB memory: -> device: 1, name: Quadro P2200, pci bus id: 0000:02:00.0, compute capability: 6.1 I0712 16:09:40.875252 10032 deeplab.py:57] Synchronized Batchnorm is used. I0712 16:09:40.877247 10032 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 4, 6, 3], 'backbone_layer_multiplier': 1.0, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 16, 'classification_mode': True, 'backbone_type': 'resnet_beta', 'use_axial_beyond_stride': 0, 'backbone_use_transformer_beyond_stride': 0, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 1.0, 'drop_path_beyond_stride': 16, 'drop_path_schedule': 'constant', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': -1, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': False, 'axial_use_recompute_grad': False, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'keras.layers.normalization.batch_normalization.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0} I0712 16:09:40.994932 10032 deeplab.py:96] Setting pooling size to (27, 41) I0712 16:09:40.994932 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:41.424783 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:41.576378 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:41.577375 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:44.969014 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:46.971171 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:46.973166 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:46.974163 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:47.611459 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:48.245076 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:48.247071 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:48.248068 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:48.894341 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:49.202517 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:49.204511 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:49.204511 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:49.855770 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. WARNING:tensorflow:Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79FE20>, because it is not built. W0712 16:09:52.798902 10032 save_impl.py:71] Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79FE20>, because it is not built. WARNING:tensorflow:Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79F3D0>, because it is not built. W0712 16:09:52.799900 10032 save_impl.py:71] Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79F3D0>, because it is not built. I0712 16:09:57.817487 10032 deeplab.py:145] Eval with scales ListWrapper([1.0]) I0712 16:09:57.818226 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:57.818226 10032 deeplab.py:153] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:58.143361 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:00.117084 10032 deeplab.py:145] Eval with scales ListWrapper([1.0]) I0712 16:10:00.118082 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:00.118082 10032 deeplab.py:153] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:10:00.223799 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:01.196901 10032 deeplab.py:145] Eval with scales ListWrapper([1.0]) I0712 16:10:01.197899 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:01.197899 10032 deeplab.py:153] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:10:01.820235 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. W0712 16:10:15.740450 10032 save.py:233] Found untraced functions such as semantic_decoder_layer_call_fn, semantic_decoder_layer_call_and_return_conditional_losses, semantic_head_layer_call_fn, semantic_head_layer_call_and_return_conditional_losses, conv1_bn_act_layer_call_fn while saving (showing 5 of 408). These functions will not be directly callable after loading. INFO:tensorflow:Assets written to: .\savedmodel\assets I0712 16:10:27.721305 10032 builder_impl.py:779] Assets written to: .\savedmodel\assets

    This warnings repeats hundreds of times during inference: WARNING:absl:Importing a function (__inference_internal_grad_fn_85642) with ops with unsaved custom gradients. Will likely fail if a gradient is requested

    I'll appreciate your help!

    opened by fschvart 9
  • usage of the checkpoint throws error

    usage of the checkpoint throws error

    I want to evaluate axial-deep lab. Here is my cmd: train.py --config_file ../configs/cityscapes/axial_deeplab/max_deeplab_s_backbone_os16.textproto --mode eval --model_dir C:\develop\max_deeplab_s_backbone_os16_axial_deeplab_cityscapes_trainfine\

    I also updated initial_checkpoint to C:\develop\max_deeplab_s_backbone_os16_axial_deeplab_cityscapes_trainfine\ckpt-60000, which is the prefix for both the data and index file downloaded from the official checkpoint.

    when I run the script, i keep getting a warning: WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer W0201 20:33:08.415140 51764 util.py:204] Unresolved object in checkpoint: (root).optimizer ....

    The dataset pattern and experiment name are correct I believe since there isn't error in previous stages, but only when the evaluation starts.

    Am I using the checkpoint int a wrong way? I am confused by checkpath path, which could be a dir, or one of the files(data, index), or it could be dir/ckpt-60000?

    opened by posEdgeOfLife 9
  • Got stuck during training

    Got stuck during training

    Hi, Thanks for sharing this great work. I successfully run the evaluating code for max-deeplab but have issues during training. I use two P40 GPU to sanity check the training code with batchsize=2. I didn't change other configs. After I ran the code, I got stuck at "shuffle buffer filled".

    The GPU utility is so low so I don't know whether or not it is running and tensorborad keeps blank. I am not familiar with TF2 (especially for this pastiche...), could anyone help to figure out what's the problem? Thank you.

    BTW, is there any way to make a progress bar like tqdm in pytorch?

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    I changed _SHUFFLE_BUFFER_SIZE=1000 and set it to 50. The "shuffle buffer filled" is ok now. But still, GPU utility very low & blank tensorboard

    I set the summary writer to work every step (maybe summary writer? I used TF1 many years ago)

    save_checkpoints_steps: 1000
    save_summaries_steps: 1 #100
    steps_per_loop: 1 #100
    

    And...I am very confused that my GPU util is related to the GPU number e.g. 8% for gpu_num=2 and 16% for gpu_num=1...While the GPU memory are fully used no matter what buffer size is

    I also tried input size 241x241, it doesn't work. The memory is still full. I think this should be an easy problem, but I am not familiar with TF....

    image image

    (py37tf) mcg@msratiranda:~/deeplab2$ python3 trainer/train.py --config_file=configs/coco/max_deeplab/max_deeplab_s_os16_res1025_200k.textproto --mode=train --model_dir=output --num_gpus=2
    2021-06-25 06:55:27.787843: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
    I0625 06:55:29.205785 140604240011456 train.py:65] Reading the config file.
    I0625 06:55:29.208885 140604240011456 train.py:69] Starting the experiment.
    2021-06-25 06:55:29.210546: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
    2021-06-25 06:55:31.068027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
    pciBusID: 0001:00:00.0 name: Tesla P40 computeCapability: 6.1
    coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
    2021-06-25 06:55:31.069245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
    pciBusID: 0002:00:00.0 name: Tesla P40 computeCapability: 6.1
    coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
    2021-06-25 06:55:31.069291: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
    2021-06-25 06:55:31.072768: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
    2021-06-25 06:55:31.072829: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
    2021-06-25 06:55:31.074202: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
    2021-06-25 06:55:31.074513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
    2021-06-25 06:55:31.077970: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
    2021-06-25 06:55:31.078721: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
    2021-06-25 06:55:31.078880: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
    2021-06-25 06:55:31.083367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
    2021-06-25 06:55:31.083814: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2021-06-25 06:55:31.468479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
    pciBusID: 0001:00:00.0 name: Tesla P40 computeCapability: 6.1
    coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
    2021-06-25 06:55:31.469669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
    pciBusID: 0002:00:00.0 name: Tesla P40 computeCapability: 6.1
    coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
    2021-06-25 06:55:31.474170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
    2021-06-25 06:55:31.474253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
    2021-06-25 06:55:32.357293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
    2021-06-25 06:55:32.357388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1
    2021-06-25 06:55:32.357413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N N
    2021-06-25 06:55:32.357428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   N N
    2021-06-25 06:55:32.363370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22149 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0001:00:00.0, compute capability: 6.1)
    2021-06-25 06:55:32.365513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22149 MB memory) -> physical GPU (device: 1, name: Tesla P40, pci bus id: 0002:00:00.0, compute capability: 6.1)
    WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
    W0625 06:55:32.369957 140604240011456 mirrored_strategy.py:379] Collective ops is not configured at program startup. Some performance features may not be enabled.
    INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
    I0625 06:55:32.867475 140604240011456 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
    I0625 06:55:32.868017 140604240011456 train_lib.py:105] Using strategy <class 'tensorflow.python.distribute.mirrored_strategy.MirroredStrategy'> with 2 replicas
    I0625 06:55:32.875228 140604240011456 deeplab.py:57] Synchronized Batchnorm is used.
    I0625 06:55:32.876093 140604240011456 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 4, 6, 3], 'backbone_layer_multiplier': 1.0, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 16, 'classification_mode': False, 'backbone_type': 'resnet_beta', 'use_axial_beyond_stride': 16, 'backbone_use_transformer_beyond_stride': 32, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 0.800000011920929, 'drop_path_beyond_stride': 16, 'drop_path_schedule': 'linear', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': -1, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': False, 'axial_use_recompute_grad': True, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'tensorflow.python.keras.layers.normalization_v2.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0}
    I0625 06:55:33.157844 140604240011456 deeplab.py:96] Setting pooling size to (65, 65)
    I0625 06:55:33.158083 140604240011456 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
    decode finish
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.530962 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.532213 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.534660 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.535581 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.538797 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.539653 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.541773 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.542600 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.545866 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    I0625 06:55:42.546801 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
    ######### 100
    I0625 06:55:42.571589 140604240011456 controller.py:391] restoring or initializing model...
    restoring or initializing model...
    I0625 06:55:42.608021 140604240011456 controller.py:395] restored model from output/Eval/ckpt-0.
    restored model from output/Eval/ckpt-0.
    I0625 06:55:42.608137 140604240011456 controller.py:217] restored from checkpoint: output/Eval/ckpt-0
    restored from checkpoint: output/Eval/ckpt-0
    I0625 06:55:43.796573 140604240011456 api.py:446] Eval with scales ListWrapper([1.0])
    I0625 06:55:45.063524 140604240011456 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
    I0625 06:55:45.090902 140604240011456 api.py:446] Eval scale 1.0; setting pooling size to [65, 65]
    WARNING:tensorflow:From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
    Instructions for updating:
    The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
    W0625 06:55:48.688872 140604240011456 deprecation.py:534] From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
    Instructions for updating:
    The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
    I0625 06:56:01.794970 140604240011456 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
    I0625 06:56:03.112913 140604240011456 controller.py:236] train | step:      0 | training until step 200000...
    train | step:      0 | training until step 200000...
    2021-06-25 06:56:04.121265: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
    2021-06-25 06:56:04.122489: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2593990000 Hz
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:05.927121 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:05.949938 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:05.972526 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.089528 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.111567 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.133234 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.252249 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.278362 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.300985 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    I0625 06:56:06.431849 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
    WARNING:tensorflow:From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206: calling foldl_v2 (from tensorflow.python.ops.functional_ops) with back_prop=False is deprecated and will be removed in a future version.
    Instructions for updating:
    back_prop=False is deprecated. Consider using tf.stop_gradient instead.
    Instead of:
    results = tf.foldl(fn, elems, back_prop=False)
    Use:
    results = tf.nest.map_structure(tf.stop_gradient, tf.foldl(fn, elems))
    W0625 06:56:43.346125 140596987537152 deprecation.py:601] From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206: calling foldl_v2 (from tensorflow.python.ops.functional_ops) with back_prop=False is deprecated and will be removed in a future version.
    Instructions for updating:
    back_prop=False is deprecated. Consider using tf.stop_gradient instead.
    Instead of:
    results = tf.foldl(fn, elems, back_prop=False)
    Use:
    results = tf.nest.map_structure(tf.stop_gradient, tf.foldl(fn, elems))
    WARNING:tensorflow:From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:463: calling while_loop_v2 (from tensorflow.python.ops.control_flow_ops) with back_prop=False is deprecated and will be removed in a future version.
    Instructions for updating:
    back_prop=False is deprecated. Consider using tf.stop_gradient instead.
    Instead of:
    results = tf.while_loop(c, b, vars, back_prop=False)
    Use:
    results = tf.nest.map_structure(tf.stop_gradient, tf.while_loop(c, b, vars))
    W0625 06:56:43.658312 140596987537152 deprecation.py:601] From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:463: calling while_loop_v2 (from tensorflow.python.ops.control_flow_ops) with back_prop=False is deprecated and will be removed in a future version.
    Instructions for updating:
    back_prop=False is deprecated. Consider using tf.stop_gradient instead.
    Instead of:
    results = tf.while_loop(c, b, vars, back_prop=False)
    Use:
    results = tf.nest.map_structure(tf.stop_gradient, tf.while_loop(c, b, vars))
    2021-06-25 07:01:32.667195: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
    2021-06-25 07:01:33.971927: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
    2021-06-25 07:01:34.548444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
    2021-06-25 07:01:34.911529: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
    2021-06-25 07:01:36.659327: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
    2021-06-25 07:01:46.261119: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 561 of 1000
    2021-06-25 07:02:00.735113: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 593 of 1000
    2021-06-25 07:02:02.728721: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 619 of 1000
    2021-06-25 07:02:15.017214: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 625 of 1000
    2021-06-25 07:02:22.714957: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 668 of 1000
    2021-06-25 07:02:34.510389: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 718 of 1000
    2021-06-25 07:02:42.780139: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 776 of 1000
    2021-06-25 07:02:52.867365: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 813 of 1000
    2021-06-25 07:03:04.207901: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 874 of 1000
    2021-06-25 07:03:12.664182: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 923 of 1000
    2021-06-25 07:03:23.321355: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 971 of 1000
    2021-06-25 07:03:28.421338: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:230] Shuffle buffer filled.
    
    opened by lxa9867 9
  • Export model for kMaX-DeepLab fails

    Export model for kMaX-DeepLab fails

    I have tried to export kMaX-DeepLab via export_model.py and I run into the following error:

    Traceback (most recent call last):
      File "deeplab2/export_model.py", line 157, in <module>
        app.run(main)
      File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
        _run_main(main, args)
      File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
        sys.exit(main(argv))
      File "deeplab2/export_model.py", line 152, in main
        tf.saved_model.save(
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1290, in save
        save_and_return_nodes(obj, export_dir, signatures, options)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1325, in save_and_return_nodes
        _build_meta_graph(obj, signatures, options, meta_graph_def))
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1491, in _build_meta_graph
        return _build_meta_graph_impl(obj, signatures, options, meta_graph_def)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1443, in _build_meta_graph_impl
        saveable_view = _SaveableView(augmented_graph_view, options)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 229, in __init__
        self.augmented_graph_view.objects_ids_and_slot_variables_and_paths())
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 544, in objects_ids_and_slot_variables_and_paths
        trackable_objects, node_paths = self._breadth_first_traversal()
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 255, in _breadth_first_traversal
        for name, dependency in self.list_children(current_trackable):
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 143, in list_children
        for name, child in super(_AugmentedGraphView, self).list_children(
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 203, in list_children
        in obj._trackable_children(save_type, **kwargs).items()]
      File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 3201, in _trackable_children
        children = super(Model, self)._trackable_children(save_type, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/keras/engine/base_layer.py", line 3174, in _trackable_children
        children = self._trackable_saved_model_saver.trackable_children(cache)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/base_serialization.py", line 59, in trackable_children
        children = self.objects_to_serialize(serialization_cache)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 68, in objects_to_serialize
        return (self._get_serialized_attributes(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 88, in _get_serialized_attributes
        object_dict, function_dict = self._get_serialized_attributes_internal(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/model_serialization.py", line 56, in _get_serialized_attributes_internal
        super(ModelSavedModelSaver, self)._get_serialized_attributes_internal(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 98, in _get_serialized_attributes_internal
        functions = save_impl.wrap_layer_functions(self.obj, serialization_cache)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 149, in wrap_layer_functions
        original_fns = _replace_child_layer_functions(layer, serialization_cache)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 276, in _replace_child_layer_functions
        child_layer._trackable_saved_model_saver._get_serialized_attributes(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 88, in _get_serialized_attributes
        object_dict, function_dict = self._get_serialized_attributes_internal(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/model_serialization.py", line 56, in _get_serialized_attributes_internal
        super(ModelSavedModelSaver, self)._get_serialized_attributes_internal(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 98, in _get_serialized_attributes_internal
        functions = save_impl.wrap_layer_functions(self.obj, serialization_cache)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 197, in wrap_layer_functions
        fn.get_concrete_function()
      File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
        next(self.gen)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 359, in tracing_scope
        fn.get_concrete_function(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 1239, in get_concrete_function
        concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 1230, in _get_concrete_function_garbage_collected
        concrete = self._stateful_fn._get_concrete_function_garbage_collected(  # pylint: disable=protected-access
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2533, in _get_concrete_function_garbage_collected
        graph_function, _ = self._maybe_define_function(args, kwargs)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2711, in _maybe_define_function
        graph_function = self._create_graph_function(args, kwargs)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2627, in _create_graph_function
        func_graph_module.func_graph_from_py_func(
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func
        func_outputs = python_func(*func_args, **func_kwargs)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
        out = weak_wrapped_fn().__wrapped__(*args, **kwds)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 572, in wrapper
        ret = method(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 168, in wrap_with_training_arg
        return control_flow_util.smart_cond(
      File "/usr/local/lib/python3.8/dist-packages/keras/utils/control_flow_util.py", line 105, in smart_cond
        return tf.__internal__.smart_cond.smart_cond(
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/smart_cond.py", line 53, in smart_cond
        return true_fn()
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 169, in <lambda>
        training, lambda: replace_training_and_call(True),
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
        return wrapped_call(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 652, in call
        return call_and_return_conditional_losses(inputs, *args, **kwargs)[0]
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 610, in __call__
        return self.wrapped_call(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 572, in wrapper
        ret = method(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 168, in wrap_with_training_arg
        return control_flow_util.smart_cond(
      File "/usr/local/lib/python3.8/dist-packages/keras/utils/control_flow_util.py", line 105, in smart_cond
        return tf.__internal__.smart_cond.smart_cond(
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 169, in <lambda>
        training, lambda: replace_training_and_call(True),
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
        return wrapped_call(*args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 634, in call_and_return_conditional_losses
        call_output = layer_call(*args, **kwargs)
      File "/panoptic_segmentation/deeplab2/model/layers/axial_block_groups.py", line 431, in call
        pixel_space_drop_path_mask = drop_path.generate_drop_path_random_mask(
      File "/panoptic_segmentation/deeplab2/model/layers/drop_path.py", line 78, in generate_drop_path_random_mask
        random_tensor += tf.random.uniform(
    TypeError: Failed to convert elements of (None, 1, 1) to Tensor. Consider casting elements to a supported type. See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes.
    

    I am using kmax_meta_r50_os32.textproto.

    opened by hannes09 7
  • 'Label' out of bounds error during evaluation

    'Label' out of bounds error during evaluation

    Hi,

    I tried either through eval (after running training) or running train_and_eval and got the same error in both cases (after training of course).

    I use a custom dataset of 2500 with panoptic annotation. Training ran without errors (can't see yet how good it was). I edited the COCO dataset file. I use only 2 labels (background and person)

    I'm using Windows 10 and an RTX 3090. Is it something that I forgot to change in the settings?

    I'll really appreciate your help!

    This is the error I get:

    I0701 20:15:52.247402 1128 controller.py:276] eval | step: 5000 | running complete evaluation... eval | step: 5000 | running complete evaluation... I0701 20:15:53.003024 1128 api.py:459] Eval with scales ListWrapper([1.0]) I0701 20:15:53.006016 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0701 20:15:53.007014 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0701 20:15:53.008012 1128 api.py:459] Eval scale 1.0; setting pooling size to [68, 121] I0701 20:15:53.969449 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0701 20:15:53.971444 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. 2022-07-01 20:15:55.516803: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:903] layout failed: INVALID_ARGUMENT: Size of values 3 does not match size of permutation 4 @ fanin shape inDeepLab/PostProcessor/StatefulPartitionedCall/while/body/_85/while/SelectV2_1-1-TransposeNHWCToNCHW-LayoutOptimizer 2022-07-01 20:16:00.112171: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. Traceback (most recent call last): File "C:\deeplab\deeplab2\trainer\train.py", line 78, in app.run(main) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "C:\deeplab\deeplab2\trainer\train.py", line 73, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "c:\deeplab\deeplab2\trainer\train_lib.py", line 194, in run_experiment controller.train_and_evaluate( File "c:\deeplab\deeplab2\orbit\controller.py", line 332, in train_and_evaluate self.evaluate(steps=eval_steps) File "c:\deeplab\deeplab2\orbit\controller.py", line 281, in evaluate eval_output = self.evaluator.evaluate(steps_tensor) File "c:\deeplab\deeplab2\orbit\standard_runner.py", line 346, in evaluate outputs = self._eval_loop_fn( File "c:\deeplab\deeplab2\orbit\utils\loop_fns.py", line 75, in loop_fn outputs = step_fn(iterator) File "C:\deeplab\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\deeplab\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

    Detected at node 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' defined at (most recent call last): File "C:\deeplab\deeplab2\trainer\train.py", line 78, in app.run(main) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "C:\deeplab\deeplab2\trainer\train.py", line 73, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "c:\deeplab\deeplab2\trainer\train_lib.py", line 194, in run_experiment controller.train_and_evaluate( File "c:\deeplab\deeplab2\orbit\controller.py", line 332, in train_and_evaluate self.evaluate(steps=eval_steps) File "c:\deeplab\deeplab2\orbit\controller.py", line 281, in evaluate eval_output = self.evaluator.evaluate(steps_tensor) File "c:\deeplab\deeplab2\orbit\standard_runner.py", line 346, in evaluate outputs = self._eval_loop_fn( File "c:\deeplab\deeplab2\orbit\utils\loop_fns.py", line 75, in loop_fn outputs = step_fn(iterator) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 181, in eval_step distributed_outputs = self._strategy.run(step_fn, args=(next(iterator),)) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 178, in step_fn step_outputs = self._eval_step(inputs) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 199, in _eval_step if self._decode_groundtruth_label: File "c:\deeplab\deeplab2\trainer\evaluator.py", line 214, in _eval_step self._eval_iou_metric.update_state( File "C:\deeplab\venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated update_op = update_state_fn(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\base_metric.py", line 140, in update_state_fn return ag_update_state(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\metrics.py", line 2494, in update_state current_cm = tf.math.confusion_matrix( Node: 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' Detected at node 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' defined at (most recent call last): File "C:\deeplab\deeplab2\trainer\train.py", line 78, in app.run(main) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "C:\deeplab\deeplab2\trainer\train.py", line 73, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "c:\deeplab\deeplab2\trainer\train_lib.py", line 194, in run_experiment controller.train_and_evaluate( File "c:\deeplab\deeplab2\orbit\controller.py", line 332, in train_and_evaluate self.evaluate(steps=eval_steps) File "c:\deeplab\deeplab2\orbit\controller.py", line 281, in evaluate eval_output = self.evaluator.evaluate(steps_tensor) File "c:\deeplab\deeplab2\orbit\standard_runner.py", line 346, in evaluate outputs = self._eval_loop_fn( File "c:\deeplab\deeplab2\orbit\utils\loop_fns.py", line 75, in loop_fn outputs = step_fn(iterator) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 181, in eval_step distributed_outputs = self._strategy.run(step_fn, args=(next(iterator),)) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 178, in step_fn step_outputs = self._eval_step(inputs) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 199, in _eval_step if self._decode_groundtruth_label: File "c:\deeplab\deeplab2\trainer\evaluator.py", line 214, in _eval_step self._eval_iou_metric.update_state( File "C:\deeplab\venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated update_op = update_state_fn(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\base_metric.py", line 140, in update_state_fn return ag_update_state(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\metrics.py", line 2494, in update_state current_cm = tf.math.confusion_matrix( Node: 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' 2 root error(s) found. (0) INVALID_ARGUMENT: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (confusion_matrix/Cast_2:0) = ] [2] [[{{node confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]] [[DeepLab/PostProcessor/StatefulPartitionedCall/PartitionedCall/while_1/body/_299/while_1/cond_1/then/_611/while_1/cond_1/cond_1/then/_700/while_1/cond_1/cond_1/while/loop_counter/_202]] (1) INVALID_ARGUMENT: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (confusion_matrix/Cast_2:0) = ] [2] [[{{node confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]] 0 successful operations. 0 derived errors ignored. [Op:__inference_eval_step_77340]

    opened by fschvart 6
  • How to segment specific objects on pre-trained model and store annotations?

    How to segment specific objects on pre-trained model and store annotations?

    I'm running the provided the Google Colab notebook, and I am able to get desired segmentations. However, I'm only interested in segmenting the 'pole' and 'traffic sign' classes from the cityscapes dataset, and ignore other classes, and finally wish to obtain the annotation of the segmentation, so that I can mask those objects for some other application.

    Is there a quick way to do so?

    Thank you.

    opened by varungupta31 6
  • Exporting MaX-DeepLab - iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function.

    Exporting MaX-DeepLab - iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function.

    Hey guys,

    Thanks for your awesome work in this repo.

    Getting the following error when attempting to export a Max-DeepLab model:

    tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: 
    iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
    

    Which is generated from this line whilst attempting to iterate over a Tensor("strided_slice:0", shape=(), dtype=int32) https://github.com/google-research/deeplab2/blob/main/model/post_processor/max_deeplab.py#L389

    I'm using Tensorflow version 2.6.0, this also happens with TF 2.5.0.

    I've also attempted to export max_deeplab_s_os16_res641_400k using the provided config and checkpoint and got the same error.

    I'll keep investigating if its an issue with my environment.

    opened by louisquinn 6
  • Error for eval for KITTI-STEP Video Panoptic Segmentation

    Error for eval for KITTI-STEP Video Panoptic Segmentation

    Follow the deeplab2/g3doc/projects/motion_deeplab.md, I try eval for kitti step dataset in panoptic deeplab. The order is a follows:'python train.py --config_file=../configs/kitti/panoptic_deeplab/resnet50_os32_trainval.textproto --mode=eval --model_dir=/home/ubuntu/dev/sda1/model/model_yunjian/motion_deeplab/single_frame_ckpt --num_gpus=0'

    while got the following error; Traceback (most recent call last): File "/home/ubuntu/yunjian/motion_deeplab/deeplab2/trainer/train.py", line 76, in app.run(main) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/absl/app.py", line 312, in run 2021-07-07 21:58:00.839820: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. _run_main(main, args) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/ubuntu/yunjian/motion_deeplab/deeplab2/trainer/train.py", line 72, in main FLAGS.num_gpus) File "/home/ubuntu/yunjian/motion_deeplab/deeplab2/trainer/train_lib.py", line 200, in run_experiment controller.evaluate(steps=config.evaluator_options.eval_steps) File "/home/ubuntu/yunjian/motion_deeplab/models/orbit/controller.py", line 282, in evaluate 2021-07-07 21:58:00.840081: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. eval_output = self.evaluator.evaluate(steps_tensor) File "/home/ubuntu/yunjian/motion_deeplab/models/orbit/standard_runner.py", line 344, in evaluate eval_iter, num_steps, state=outputs, reduce_fn=self.eval_reduce) File "/home/ubuntu/yunjian/motion_deeplab/models/orbit/utils/loop_fns.py", line 74, in loop_fn outputs = step_fn(iterator) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 780, in call 2021-07-07 21:58:00.840692: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. result = self._call(*args, **kwds) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call return self._stateless_fn(*args, **kwds) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2829, in call 2021-07-07 21:58:00.841029: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.841384: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call 2021-07-07 21:58:00.841919: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.842237: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. cancellation_manager=cancellation_manager) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat 2021-07-07 21:58:00.842551: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.842904: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. ctx, args, cancellation_manager=cancellation_manager)) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 550, in call 2021-07-07 21:58:00.843215: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. ctx=ctx) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. [[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext]] [Op:__inference_eval_step_11410]

    Function call stack: eval_step

    2021-07-07 21:58:00.843554: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.843883: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.844309: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.844679: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.844835: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.845372: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.845480: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.845700: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.846071: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.846429: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.846806: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.847188: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.847566: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.847957: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.848308: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.848663: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.849024: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.849314: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.849651: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.850048: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.850433: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.850796: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.851174: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.851542: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.851872: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.852309: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.852655: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.853005: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.853372: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.853777: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.854121: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.854466: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.854830: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.855111: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.855504: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.855864: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.856161: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.856533: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.856867: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.857240: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.857590: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.857939: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.858282: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.858656: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.859006: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.859322: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.859705: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.860125: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.860466: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.860779: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.861175: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.861490: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.861879: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.862221: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.862602: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.862925: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.863304: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found.

    Process finished with exit code 1

    The computer configuration is as follows: cuda=10.0.130 cudnn=7.6.5 tensorflow=2.3.0 while I run 'deeplab2/compile.sh gpu',It returns 'Done with configuration!'.

    opened by yjlin0223 6
  • Running Panoptic Segmentation on cityscapes using the

    Running Panoptic Segmentation on cityscapes using the "mobilenet_v3_large_os32.textproto" config file provided #102

    System Details:

    OS: Ubuntu 20.04

    Python 3.9.12

    Tensorflow 2.6.0

    Note: After commenting out the following line "#assert backbone_options.use_squeeze_and_excite" in model/builder.py file started the training but I am getting following warning "Squeeze and Excitation is skipped due to undefined se_ratio"

    Hello deeplab team,

    **After running the following command:

    python trainer/train.py --config_file=configs/cityscapes/panoptic_deeplab/mobilenet_v3_large_os32.textproto --mode=train --model_dir=Model_Output/ --num_gpus=1

    I get this error log.**

    2022-06-10 12:24:53.117800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.149730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.149850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I0610 12:24:53.150171 139902151967936 train.py:65] Reading the config file. I0610 12:24:53.152600 139902151967936 train.py:69] Starting the experiment. 2022-06-10 12:24:53.152945: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-06-10 12:24:53.154345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.154446: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.154517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 87 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6 I0610 12:24:53.535348 139902151967936 train_lib.py:104] Using strategy <class 'tensorflow.python.distribute.one_device_strategy.OneDeviceStrategy'> with 1 replicas I0610 12:24:53.540082 139902151967936 deeplab.py:57] Synchronized Batchnorm is used. Traceback (most recent call last): File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train.py", line 76, in app.run(main) File "/home/prabal/anaconda3/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/prabal/anaconda3/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train.py", line 71, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train_lib.py", line 127, in run_experiment deeplab_model = create_deeplab_model( File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train_lib.py", line 57, in create_deeplab_model return deeplab.DeepLab(config, dataset_descriptor) File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/model/deeplab.py", line 77, in init self._encoder = builder.create_encoder( File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/model/builder.py", line 57, in create_encoder return create_mobilenet_encoder( File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/model/builder.py", line 87, in create_mobilenet_encoder assert backbone_options.use_squeeze_and_excite AssertionError

    opened by prabal27 5
  • ValueError: Dimensions must be equal

    ValueError: Dimensions must be equal

    Hi, I've been trying to train the COCO dataset using Panoptic Deeplab along with resnet50_os32 config but getting the following error

    ValueError: Dimensions must be equal, but are 641 and 161 for '{{node DeepLabFamilyLoss/TopKGeneralLoss/sub}} = Sub[T=DT_FLOAT](IteratorGetNext:2, DeepLab/Squeeze)' with input shapes: [64,641,641], [64,161,161].

    Could you please help me point to the source of the issue?

    opened by pseudo-swati 0
  • Code compatibility with python < 3.9

    Code compatibility with python < 3.9

    It seems that the files "deeplab2/model/layers/moat_blocks.py" and "deeplab2/model/pixel_encoder/moat.py" are using list[int] and List[int] interchangeably in function definitions. From what I can gather - this works in python 3.9 but not in previous versions. I was able to make the code run in python 3.8 by editing the files to use List[int], and Dict[int] and importing List and Dict from typing.

    opened by brendonlutnick 0
  • Logits and scores of semantic prediction

    Logits and scores of semantic prediction

    The model output for the semantic predictions is in the format:

    semantic logits : (batch, 81, 81, num_classes) (For a crop_size of 321) semantic scores : (batch, img_height, img_width, num_classes).

    How to get the semantic logits in the exact shape of semantic scores?

    opened by sonukiller 0
  •  Op type not registered 'MergeSemanticAndInstanceMaps' in binary running on wvmgputprseus

    Op type not registered 'MergeSemanticAndInstanceMaps' in binary running on wvmgputprseus

    I have exported panoptic_deeplab model. I'm getting following error while loading the model using tf.saved_model.load() NotFoundError: Op type not registered 'MergeSemanticAndInstanceMaps' in binary running on wvmgputprseus. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

    opened by ShivaThe 0
  • [ViP-DeepLab] Add wSTQ in numpy for PVPS dataset.

    [ViP-DeepLab] Add wSTQ in numpy for PVPS dataset.

    This PR adds wSTQ implementation in numpy. It also adds the unit test to guarantee the compatibility of tf & numpy implementation. Fix a dtype error for tf impl.

    opened by meijieru 5
  • Unstable numeric output for downstream task (moat 4 w/o pos)

    Unstable numeric output for downstream task (moat 4 w/o pos)

    ckpts : moat4 w/o pos

    The output from moat4 can easily result in the following layers (e.g., 3x3 conv) having a NaN output.

    The same issue, at least, does not show in moat0.

    It is the first time I have met this issue in my career (I met NaN many times, but never like this), so I need to take some time to investigate this issue.

    I will update this issue if I have a new finding. Please also check if the provided ckpts are working.

    opened by edwardyehuang 1
Owner
Google Research
Google Research
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

Stanford Machine Learning Group 51 Dec 8, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

null 75 Dec 16, 2022
Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

SPICE: Semantic Pseudo-labeling for Image Clustering By Chuang Niu and Ge Wang This is a Pytorch implementation of the paper. (In updating) SOTA on 5

Chuang Niu 154 Dec 15, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

Reza Azad 14 Oct 24, 2022
Empower Sequence Labeling with Task-Aware Language Model

LM-LSTM-CRF Check Our New NER Toolkit ?? ?? ?? Inference: LightNER: inference w. models pre-trained / trained w. any following tools, efficiently. Tra

Liyuan Liu 838 Jan 5, 2023
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

Appen Repos 86 Dec 7, 2022
GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Xinyan Zhao 29 Dec 26, 2022
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Introduction Getting Started FSD50K Recipe AudioSet Recipe Label E

Yuan Gong 84 Dec 27, 2022
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix ?? labelpix is a graphical image labeling interface for drawing bounding boxes. ?? Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

llt 4 Aug 24, 2022
A practical ML pipeline for data labeling with experiment tracking using DVC.

Auto Label Pipeline A practical ML pipeline for data labeling with experiment tracking using DVC Goals: Demonstrate reproducible ML Use DVC to build a

Todd Cook 4 Mar 8, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.6k Jan 2, 2023
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.5k Feb 12, 2021
QKeras: a quantization deep learning library for Tensorflow Keras

QKeras github.com/google/qkeras QKeras 0.8 highlights: Automatic quantization using QKeras; Stochastic behavior (including stochastic rouding) is disa

Google 437 Jan 3, 2023
RetinaFace: Deep Face Detection Library in TensorFlow for Python

RetinaFace is a deep learning based cutting-edge facial detector for Python coming with facial landmarks.

Sefik Ilkin Serengil 512 Dec 29, 2022