DeepLab2: A TensorFlow Library for Deep Labeling

Google Research

Last update: Jan 4, 2023

Related tags

Deep Learning deeplab2

Overview

DeepLab2: A TensorFlow Library for Deep Labeling

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks, including, but not limited to semantic segmentation, instance segmentation, panoptic segmentation, depth estimation, or even video panoptic segmentation.

Deep labeling refers to solving computer vision problems by assigning a predicted value for each pixel in an image with a deep neural network. As long as the problem of interest could be formulated in this way, DeepLab2 should serve the purpose. Additionally, this codebase includes our recent and state-of-the-art research models on deep labeling. We hope you will find it useful for your projects.

Installation

See Installation.

Dataset preparation

The dataset needs to be converted to TFRecord. We provide some examples below.

Some guidances about how to convert your own dataset.

Your Own Dataset

Projects

We list a few projects that use DeepLab2.

Colab Demo

Colab notebook for off-the-shelf inference.

Running DeepLab2

See Getting Started. In short, run the following command:

To run DeepLab2 on GPUs, the following command should be used:

python training/train.py \
    --config_file=${CONFIG_FILE} \
    --mode={train | eval | train_and_eval | continuous_eval} \
    --model_dir=${BASE_MODEL_DIRECTORY} \
    --num_gpus=${NUM_GPUS}

Change logs

See Change logs for recent updates.

Contacts (Maintainers)

Please check FAQ if you have some questions before reporting the issues.

Mark Weber, github: markweberdev
Huiyu Wang, github: csrhddlam
Siyuan Qiao, github: joe-siyuan-qiao
Jun Xie, github: clairexie
Maxwell D. Collins, github: mcollinswisc
YuKun Zhu, github: yknzhu
Liangzhe Yuan, github: yuanliangzhe
Dahun Kim, github: mcahny
Qihang Yu, github: yucornetto
Liang-Chieh Chen, github: aquariusjay

Disclaimer

Note that this library contains our re-implemented DeepLab models in TensorFlow2, and thus may have some minor differences from the published papers (e.g., learning rate).
This is not an official Google product.

Citing DeepLab2

If you find DeepLab2 useful for your project, please consider citing DeepLab2 along with the relevant DeepLab series.

DeepLab2:

@article{deeplab2_2021,
  author={Mark Weber and Huiyu Wang and Siyuan Qiao and Jun Xie and Maxwell D. Collins and Yukun Zhu and Liangzhe Yuan and Dahun Kim and Qihang Yu and Daniel Cremers and Laura Leal-Taixe and Alan L. Yuille and Florian Schroff and Hartwig Adam and Liang-Chieh Chen},
  title={{DeepLab2: A TensorFlow Library for Deep Labeling}},
  journal={arXiv: 2106.09748},
  year={2021}
}

References

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. "The cityscapes dataset for semantic urban scene understanding." In CVPR, 2016.
Andreas Geiger, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." In CVPR, 2012.
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. "Semantickitti: A dataset for semantic scene understanding of lidar sequences." In ICCV, 2019.
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. "Panoptic segmentation." In CVPR, 2019.
Dahun Kim, Sanghyun Woo, Joon-Young Lee, and In So Kweon. "Video panoptic segmentation." In CVPR, 2020.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. "Microsoft COCO: Common objects in context." In ECCV, 2014.
Patrick Dendorfer, Aljosa Osep, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth, and Laura Leal-Taixe. "MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking." IJCV, 2020.

Comments

Cannot do evaluation on vip-deeplab

Hi, @joe-siyuan-qiao

Thanks for your work on deeplab2. I am trying to use deeplab2 to run vip-deeplab. Although the training process is good, I met the following error when evaluating:

E tensorflow/core/grappler/optimizers/meta_optimizer.cc:801] layout failed: Invalid argument: Size of values 3 does not match size of permutation 4 @ fanin shape inViPDeepLab/PostProcessor/StatefulPartitionedCall/while/body/_166/while/SelectV2_1-1-TransposeNHWCToNCHW-LayoutOptimizer

By the way, how many gpus should I use to reproduce the results in the paper (with the config in the "resnet50_beta_os32.textproto", batch size = 4)?

Thanks.

opened by HarborYuan 24
What is your environment for testing your model?

Dear authors: Hi! Thanks for opensourcing this repo. I meet several problems for runing this repo.

I got stuck when performing evaluation process according to the issue. https://github.com/google-research/deeplab2/issues/58

I make such change to run you model (). I download the MotionDeeplab ckpt from this repo. I perform the evaluation process. but the results are nearly zero.

Has anyone successfully run this repo????

I doubt it maybe enviroment problems.

I use RTX-3090 with tf2.5 cuda 11.1.

opened by lxtGH 11
Exported saved_model file performs bad comparing to eval on same images

Hi,

I trained a two semantic segmentation "panoptic segmentation" models, one with a resnet_beta and one with swidernet backbones and I have the same issue with both. When I try to export the 60k checkpoint to a saved_model (If there's another format that might work better I'm happy to try it) the export completes but the models deliver inferior performance comparing with the model latest eval when using the same images. The general shape of the masks is reasonable, but there are some spots that are missing and at the some time some spots appear randomly in the image.

I already tried disabling axial_use_recompute_grad, and that didn't help. Also following the process, there's a file in the saved_model folder and two in the variables folder, but none in the assets folder.

I'm using Windows 11 and an RTX 3090.

Here's the output during the export process:

2022-07-12 16:09:39.451488: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-07-12 16:09:40.292519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21676 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:21:00.0, compute capability: 8.6 2022-07-12 16:09:40.293903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 3624 MB memory: -> device: 1, name: Quadro P2200, pci bus id: 0000:02:00.0, compute capability: 6.1 I0712 16:09:40.875252 10032 deeplab.py:57] Synchronized Batchnorm is used. I0712 16:09:40.877247 10032 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 4, 6, 3], 'backbone_layer_multiplier': 1.0, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 16, 'classification_mode': True, 'backbone_type': 'resnet_beta', 'use_axial_beyond_stride': 0, 'backbone_use_transformer_beyond_stride': 0, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 1.0, 'drop_path_beyond_stride': 16, 'drop_path_schedule': 'constant', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': -1, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': False, 'axial_use_recompute_grad': False, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'keras.layers.normalization.batch_normalization.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0} I0712 16:09:40.994932 10032 deeplab.py:96] Setting pooling size to (27, 41) I0712 16:09:40.994932 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:41.424783 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:41.576378 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:41.577375 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:44.969014 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:46.971171 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:46.973166 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:46.974163 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:47.611459 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:48.245076 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:48.247071 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:48.248068 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:48.894341 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:49.202517 10032 api.py:459] Eval with scales ListWrapper([1.0]) I0712 16:09:49.204511 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:49.204511 10032 api.py:459] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:49.855770 10032 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. WARNING:tensorflow:Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79FE20>, because it is not built. W0712 16:09:52.798902 10032 save_impl.py:71] Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79FE20>, because it is not built. WARNING:tensorflow:Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79F3D0>, because it is not built. W0712 16:09:52.799900 10032 save_impl.py:71] Skipping full serialization of Keras layer <deeplab2.model.layers.resized_fuse.ResizedFuse object at 0x000001398E79F3D0>, because it is not built. I0712 16:09:57.817487 10032 deeplab.py:145] Eval with scales ListWrapper([1.0]) I0712 16:09:57.818226 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:09:57.818226 10032 deeplab.py:153] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:09:58.143361 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:00.117084 10032 deeplab.py:145] Eval with scales ListWrapper([1.0]) I0712 16:10:00.118082 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:00.118082 10032 deeplab.py:153] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:10:00.223799 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:01.196901 10032 deeplab.py:145] Eval with scales ListWrapper([1.0]) I0712 16:10:01.197899 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0712 16:10:01.197899 10032 deeplab.py:153] Eval scale 1.0; setting pooling size to [27, 41] I0712 16:10:01.820235 10032 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. W0712 16:10:15.740450 10032 save.py:233] Found untraced functions such as semantic_decoder_layer_call_fn, semantic_decoder_layer_call_and_return_conditional_losses, semantic_head_layer_call_fn, semantic_head_layer_call_and_return_conditional_losses, conv1_bn_act_layer_call_fn while saving (showing 5 of 408). These functions will not be directly callable after loading. INFO:tensorflow:Assets written to: .\savedmodel\assets I0712 16:10:27.721305 10032 builder_impl.py:779] Assets written to: .\savedmodel\assets

This warnings repeats hundreds of times during inference: WARNING:absl:Importing a function (__inference_internal_grad_fn_85642) with ops with unsaved custom gradients. Will likely fail if a gradient is requested

I'll appreciate your help!

opened by fschvart 9
usage of the checkpoint throws error

I want to evaluate axial-deep lab. Here is my cmd: train.py --config_file ../configs/cityscapes/axial_deeplab/max_deeplab_s_backbone_os16.textproto --mode eval --model_dir C:\develop\max_deeplab_s_backbone_os16_axial_deeplab_cityscapes_trainfine\

I also updated initial_checkpoint to C:\develop\max_deeplab_s_backbone_os16_axial_deeplab_cityscapes_trainfine\ckpt-60000, which is the prefix for both the data and index file downloaded from the official checkpoint.

when I run the script, i keep getting a warning: WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer W0201 20:33:08.415140 51764 util.py:204] Unresolved object in checkpoint: (root).optimizer ....

The dataset pattern and experiment name are correct I believe since there isn't error in previous stages, but only when the evaluation starts.

Am I using the checkpoint int a wrong way? I am confused by checkpath path, which could be a dir, or one of the files(data, index), or it could be dir/ckpt-60000?

opened by posEdgeOfLife 9

Got stuck during training

Hi, Thanks for sharing this great work. I successfully run the evaluating code for max-deeplab but have issues during training. I use two P40 GPU to sanity check the training code with batchsize=2. I didn't change other configs. After I ran the code, I got stuck at "shuffle buffer filled".

The GPU utility is so low so I don't know whether or not it is running and tensorborad keeps blank. I am not familiar with TF2 (especially for this pastiche...), could anyone help to figure out what's the problem? Thank you.

BTW, is there any way to make a progress bar like tqdm in pytorch?

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I changed _SHUFFLE_BUFFER_SIZE=1000 and set it to 50. The "shuffle buffer filled" is ok now. But still, GPU utility very low & blank tensorboard

I set the summary writer to work every step (maybe summary writer? I used TF1 many years ago)

save_checkpoints_steps: 1000
save_summaries_steps: 1 #100
steps_per_loop: 1 #100

And...I am very confused that my GPU util is related to the GPU number e.g. 8% for gpu_num=2 and 16% for gpu_num=1...While the GPU memory are fully used no matter what buffer size is

I also tried input size 241x241, it doesn't work. The memory is still full. I think this should be an easy problem, but I am not familiar with TF....

(py37tf) mcg@msratiranda:~/deeplab2$ python3 trainer/train.py --config_file=configs/coco/max_deeplab/max_deeplab_s_os16_res1025_200k.textproto --mode=train --model_dir=output --num_gpus=2
2021-06-25 06:55:27.787843: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
I0625 06:55:29.205785 140604240011456 train.py:65] Reading the config file.
I0625 06:55:29.208885 140604240011456 train.py:69] Starting the experiment.
2021-06-25 06:55:29.210546: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-25 06:55:31.068027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0001:00:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2021-06-25 06:55:31.069245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0002:00:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2021-06-25 06:55:31.069291: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-25 06:55:31.072768: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-25 06:55:31.072829: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-25 06:55:31.074202: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-06-25 06:55:31.074513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-06-25 06:55:31.077970: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-06-25 06:55:31.078721: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-06-25 06:55:31.078880: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-25 06:55:31.083367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-06-25 06:55:31.083814: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-25 06:55:31.468479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0001:00:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2021-06-25 06:55:31.469669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0002:00:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 22.38GiB deviceMemoryBandwidth: 323.21GiB/s
2021-06-25 06:55:31.474170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2021-06-25 06:55:31.474253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-25 06:55:32.357293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-25 06:55:32.357388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1
2021-06-25 06:55:32.357413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N N
2021-06-25 06:55:32.357428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   N N
2021-06-25 06:55:32.363370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22149 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0001:00:00.0, compute capability: 6.1)
2021-06-25 06:55:32.365513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22149 MB memory) -> physical GPU (device: 1, name: Tesla P40, pci bus id: 0002:00:00.0, compute capability: 6.1)
WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
W0625 06:55:32.369957 140604240011456 mirrored_strategy.py:379] Collective ops is not configured at program startup. Some performance features may not be enabled.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
I0625 06:55:32.867475 140604240011456 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
I0625 06:55:32.868017 140604240011456 train_lib.py:105] Using strategy <class 'tensorflow.python.distribute.mirrored_strategy.MirroredStrategy'> with 2 replicas
I0625 06:55:32.875228 140604240011456 deeplab.py:57] Synchronized Batchnorm is used.
I0625 06:55:32.876093 140604240011456 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 4, 6, 3], 'backbone_layer_multiplier': 1.0, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 16, 'classification_mode': False, 'backbone_type': 'resnet_beta', 'use_axial_beyond_stride': 16, 'backbone_use_transformer_beyond_stride': 32, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 0.800000011920929, 'drop_path_beyond_stride': 16, 'drop_path_schedule': 'linear', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': -1, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': False, 'axial_use_recompute_grad': True, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'tensorflow.python.keras.layers.normalization_v2.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0}
I0625 06:55:33.157844 140604240011456 deeplab.py:96] Setting pooling size to (65, 65)
I0625 06:55:33.158083 140604240011456 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
decode finish
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.530962 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.532213 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.534660 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.535581 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.538797 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.539653 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.541773 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.542600 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.545866 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0625 06:55:42.546801 140604240011456 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
######### 100
I0625 06:55:42.571589 140604240011456 controller.py:391] restoring or initializing model...
restoring or initializing model...
I0625 06:55:42.608021 140604240011456 controller.py:395] restored model from output/Eval/ckpt-0.
restored model from output/Eval/ckpt-0.
I0625 06:55:42.608137 140604240011456 controller.py:217] restored from checkpoint: output/Eval/ckpt-0
restored from checkpoint: output/Eval/ckpt-0
I0625 06:55:43.796573 140604240011456 api.py:446] Eval with scales ListWrapper([1.0])
I0625 06:55:45.063524 140604240011456 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0625 06:55:45.090902 140604240011456 api.py:446] Eval scale 1.0; setting pooling size to [65, 65]
WARNING:tensorflow:From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
W0625 06:55:48.688872 140604240011456 deprecation.py:534] From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
I0625 06:56:01.794970 140604240011456 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I0625 06:56:03.112913 140604240011456 controller.py:236] train | step:      0 | training until step 200000...
train | step:      0 | training until step 200000...
2021-06-25 06:56:04.121265: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-06-25 06:56:04.122489: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2593990000 Hz
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:05.927121 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:05.949938 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:05.972526 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.089528 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.111567 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.133234 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.252249 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.278362 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.300985 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
I0625 06:56:06.431849 140604240011456 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
WARNING:tensorflow:From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206: calling foldl_v2 (from tensorflow.python.ops.functional_ops) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.foldl(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.foldl(fn, elems))
W0625 06:56:43.346125 140596987537152 deprecation.py:601] From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206: calling foldl_v2 (from tensorflow.python.ops.functional_ops) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.foldl(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.foldl(fn, elems))
WARNING:tensorflow:From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:463: calling while_loop_v2 (from tensorflow.python.ops.control_flow_ops) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.while_loop(c, b, vars, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.while_loop(c, b, vars))
W0625 06:56:43.658312 140596987537152 deprecation.py:601] From /home/mcg/miniconda3/envs/py37tf/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:463: calling while_loop_v2 (from tensorflow.python.ops.control_flow_ops) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.while_loop(c, b, vars, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.while_loop(c, b, vars))
2021-06-25 07:01:32.667195: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-25 07:01:33.971927: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-06-25 07:01:34.548444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-25 07:01:34.911529: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-25 07:01:36.659327: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-06-25 07:01:46.261119: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 561 of 1000
2021-06-25 07:02:00.735113: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 593 of 1000
2021-06-25 07:02:02.728721: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 619 of 1000
2021-06-25 07:02:15.017214: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 625 of 1000
2021-06-25 07:02:22.714957: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 668 of 1000
2021-06-25 07:02:34.510389: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 718 of 1000
2021-06-25 07:02:42.780139: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 776 of 1000
2021-06-25 07:02:52.867365: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 813 of 1000
2021-06-25 07:03:04.207901: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 874 of 1000
2021-06-25 07:03:12.664182: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 923 of 1000
2021-06-25 07:03:23.321355: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 971 of 1000
2021-06-25 07:03:28.421338: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:230] Shuffle buffer filled.

opened by lxa9867 9

Export model for kMaX-DeepLab fails

I have tried to export kMaX-DeepLab via export_model.py and I run into the following error:

Traceback (most recent call last):
  File "deeplab2/export_model.py", line 157, in <module>
    app.run(main)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "deeplab2/export_model.py", line 152, in main
    tf.saved_model.save(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1290, in save
    save_and_return_nodes(obj, export_dir, signatures, options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1325, in save_and_return_nodes
    _build_meta_graph(obj, signatures, options, meta_graph_def))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1491, in _build_meta_graph
    return _build_meta_graph_impl(obj, signatures, options, meta_graph_def)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1443, in _build_meta_graph_impl
    saveable_view = _SaveableView(augmented_graph_view, options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 229, in __init__
    self.augmented_graph_view.objects_ids_and_slot_variables_and_paths())
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 544, in objects_ids_and_slot_variables_and_paths
    trackable_objects, node_paths = self._breadth_first_traversal()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 255, in _breadth_first_traversal
    for name, dependency in self.list_children(current_trackable):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 143, in list_children
    for name, child in super(_AugmentedGraphView, self).list_children(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/graph_view.py", line 203, in list_children
    in obj._trackable_children(save_type, **kwargs).items()]
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 3201, in _trackable_children
    children = super(Model, self)._trackable_children(save_type, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/base_layer.py", line 3174, in _trackable_children
    children = self._trackable_saved_model_saver.trackable_children(cache)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/base_serialization.py", line 59, in trackable_children
    children = self.objects_to_serialize(serialization_cache)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 68, in objects_to_serialize
    return (self._get_serialized_attributes(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 88, in _get_serialized_attributes
    object_dict, function_dict = self._get_serialized_attributes_internal(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/model_serialization.py", line 56, in _get_serialized_attributes_internal
    super(ModelSavedModelSaver, self)._get_serialized_attributes_internal(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 98, in _get_serialized_attributes_internal
    functions = save_impl.wrap_layer_functions(self.obj, serialization_cache)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 149, in wrap_layer_functions
    original_fns = _replace_child_layer_functions(layer, serialization_cache)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 276, in _replace_child_layer_functions
    child_layer._trackable_saved_model_saver._get_serialized_attributes(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 88, in _get_serialized_attributes
    object_dict, function_dict = self._get_serialized_attributes_internal(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/model_serialization.py", line 56, in _get_serialized_attributes_internal
    super(ModelSavedModelSaver, self)._get_serialized_attributes_internal(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/layer_serialization.py", line 98, in _get_serialized_attributes_internal
    functions = save_impl.wrap_layer_functions(self.obj, serialization_cache)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 197, in wrap_layer_functions
    fn.get_concrete_function()
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 359, in tracing_scope
    fn.get_concrete_function(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 1239, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 1230, in _get_concrete_function_garbage_collected
    concrete = self._stateful_fn._get_concrete_function_garbage_collected(  # pylint: disable=protected-access
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2533, in _get_concrete_function_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2711, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2627, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 572, in wrapper
    ret = method(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 168, in wrap_with_training_arg
    return control_flow_util.smart_cond(
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/control_flow_util.py", line 105, in smart_cond
    return tf.__internal__.smart_cond.smart_cond(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/smart_cond.py", line 53, in smart_cond
    return true_fn()
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 169, in <lambda>
    training, lambda: replace_training_and_call(True),
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
    return wrapped_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 652, in call
    return call_and_return_conditional_losses(inputs, *args, **kwargs)[0]
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 610, in __call__
    return self.wrapped_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 572, in wrapper
    ret = method(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 168, in wrap_with_training_arg
    return control_flow_util.smart_cond(
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/control_flow_util.py", line 105, in smart_cond
    return tf.__internal__.smart_cond.smart_cond(
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 169, in <lambda>
    training, lambda: replace_training_and_call(True),
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/utils.py", line 166, in replace_training_and_call
    return wrapped_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/saving/saved_model/save_impl.py", line 634, in call_and_return_conditional_losses
    call_output = layer_call(*args, **kwargs)
  File "/panoptic_segmentation/deeplab2/model/layers/axial_block_groups.py", line 431, in call
    pixel_space_drop_path_mask = drop_path.generate_drop_path_random_mask(
  File "/panoptic_segmentation/deeplab2/model/layers/drop_path.py", line 78, in generate_drop_path_random_mask
    random_tensor += tf.random.uniform(
TypeError: Failed to convert elements of (None, 1, 1) to Tensor. Consider casting elements to a supported type. See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes.

I am using kmax_meta_r50_os32.textproto.

opened by hannes09 7

'Label' out of bounds error during evaluation

Hi,

I tried either through eval (after running training) or running train_and_eval and got the same error in both cases (after training of course).

I use a custom dataset of 2500 with panoptic annotation. Training ran without errors (can't see yet how good it was). I edited the COCO dataset file. I use only 2 labels (background and person)

I'm using Windows 10 and an RTX 3090. Is it something that I forgot to change in the settings?

I'll really appreciate your help!

This is the error I get:

I0701 20:15:52.247402 1128 controller.py:276] eval | step: 5000 | running complete evaluation... eval | step: 5000 | running complete evaluation... I0701 20:15:53.003024 1128 api.py:459] Eval with scales ListWrapper([1.0]) I0701 20:15:53.006016 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0701 20:15:53.007014 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0701 20:15:53.008012 1128 api.py:459] Eval scale 1.0; setting pooling size to [68, 121] I0701 20:15:53.969449 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. I0701 20:15:53.971444 1128 api.py:459] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended. 2022-07-01 20:15:55.516803: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:903] layout failed: INVALID_ARGUMENT: Size of values 3 does not match size of permutation 4 @ fanin shape inDeepLab/PostProcessor/StatefulPartitionedCall/while/body/_85/while/SelectV2_1-1-TransposeNHWCToNCHW-LayoutOptimizer 2022-07-01 20:16:00.112171: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. Traceback (most recent call last): File "C:\deeplab\deeplab2\trainer\train.py", line 78, in app.run(main) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "C:\deeplab\deeplab2\trainer\train.py", line 73, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "c:\deeplab\deeplab2\trainer\train_lib.py", line 194, in run_experiment controller.train_and_evaluate( File "c:\deeplab\deeplab2\orbit\controller.py", line 332, in train_and_evaluate self.evaluate(steps=eval_steps) File "c:\deeplab\deeplab2\orbit\controller.py", line 281, in evaluate eval_output = self.evaluator.evaluate(steps_tensor) File "c:\deeplab\deeplab2\orbit\standard_runner.py", line 346, in evaluate outputs = self._eval_loop_fn( File "c:\deeplab\deeplab2\orbit\utils\loop_fns.py", line 75, in loop_fn outputs = step_fn(iterator) File "C:\deeplab\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\deeplab\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' defined at (most recent call last): File "C:\deeplab\deeplab2\trainer\train.py", line 78, in app.run(main) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "C:\deeplab\deeplab2\trainer\train.py", line 73, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "c:\deeplab\deeplab2\trainer\train_lib.py", line 194, in run_experiment controller.train_and_evaluate( File "c:\deeplab\deeplab2\orbit\controller.py", line 332, in train_and_evaluate self.evaluate(steps=eval_steps) File "c:\deeplab\deeplab2\orbit\controller.py", line 281, in evaluate eval_output = self.evaluator.evaluate(steps_tensor) File "c:\deeplab\deeplab2\orbit\standard_runner.py", line 346, in evaluate outputs = self._eval_loop_fn( File "c:\deeplab\deeplab2\orbit\utils\loop_fns.py", line 75, in loop_fn outputs = step_fn(iterator) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 181, in eval_step distributed_outputs = self._strategy.run(step_fn, args=(next(iterator),)) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 178, in step_fn step_outputs = self._eval_step(inputs) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 199, in _eval_step if self._decode_groundtruth_label: File "c:\deeplab\deeplab2\trainer\evaluator.py", line 214, in _eval_step self._eval_iou_metric.update_state( File "C:\deeplab\venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated update_op = update_state_fn(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\base_metric.py", line 140, in update_state_fn return ag_update_state(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\metrics.py", line 2494, in update_state current_cm = tf.math.confusion_matrix( Node: 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' Detected at node 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' defined at (most recent call last): File "C:\deeplab\deeplab2\trainer\train.py", line 78, in app.run(main) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "C:\deeplab\venv\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "C:\deeplab\deeplab2\trainer\train.py", line 73, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "c:\deeplab\deeplab2\trainer\train_lib.py", line 194, in run_experiment controller.train_and_evaluate( File "c:\deeplab\deeplab2\orbit\controller.py", line 332, in train_and_evaluate self.evaluate(steps=eval_steps) File "c:\deeplab\deeplab2\orbit\controller.py", line 281, in evaluate eval_output = self.evaluator.evaluate(steps_tensor) File "c:\deeplab\deeplab2\orbit\standard_runner.py", line 346, in evaluate outputs = self._eval_loop_fn( File "c:\deeplab\deeplab2\orbit\utils\loop_fns.py", line 75, in loop_fn outputs = step_fn(iterator) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 181, in eval_step distributed_outputs = self._strategy.run(step_fn, args=(next(iterator),)) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 178, in step_fn step_outputs = self._eval_step(inputs) File "c:\deeplab\deeplab2\trainer\evaluator.py", line 199, in _eval_step if self._decode_groundtruth_label: File "c:\deeplab\deeplab2\trainer\evaluator.py", line 214, in _eval_step self._eval_iou_metric.update_state( File "C:\deeplab\venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated update_op = update_state_fn(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\base_metric.py", line 140, in update_state_fn return ag_update_state(*args, **kwargs) File "C:\deeplab\venv\lib\site-packages\keras\metrics\metrics.py", line 2494, in update_state current_cm = tf.math.confusion_matrix( Node: 'confusion_matrix/assert_less/Assert/AssertGuard/Assert' 2 root error(s) found. (0) INVALID_ARGUMENT: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (confusion_matrix/Cast_2:0) = ] [2] [[{{node confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]] [[DeepLab/PostProcessor/StatefulPartitionedCall/PartitionedCall/while_1/body/_299/while_1/cond_1/then/_611/while_1/cond_1/cond_1/then/_700/while_1/cond_1/cond_1/while/loop_counter/_202]] (1) INVALID_ARGUMENT: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (confusion_matrix/Cast_2:0) = ] [2] [[{{node confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]] 0 successful operations. 0 derived errors ignored. [Op:__inference_eval_step_77340]

opened by fschvart 6
How to segment specific objects on pre-trained model and store annotations?

I'm running the provided the Google Colab notebook, and I am able to get desired segmentations. However, I'm only interested in segmenting the 'pole' and 'traffic sign' classes from the cityscapes dataset, and ignore other classes, and finally wish to obtain the annotation of the segmentation, so that I can mask those objects for some other application.

Is there a quick way to do so?

Thank you.

opened by varungupta31 6
Exporting MaX-DeepLab - iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function.
Hey guys,

Thanks for your awesome work in this repo.

Getting the following error when attempting to export a Max-DeepLab model:

tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

Which is generated from this line whilst attempting to iterate over a Tensor("strided_slice:0", shape=(), dtype=int32) https://github.com/google-research/deeplab2/blob/main/model/post_processor/max_deeplab.py#L389

I'm using Tensorflow version 2.6.0, this also happens with TF 2.5.0.

I've also attempted to export max_deeplab_s_os16_res641_400k using the provided config and checkpoint and got the same error.

I'll keep investigating if its an issue with my environment.
opened by louisquinn 6
Error for eval for KITTI-STEP Video Panoptic Segmentation

Follow the deeplab2/g3doc/projects/motion_deeplab.md, I try eval for kitti step dataset in panoptic deeplab. The order is a follows:'python train.py --config_file=../configs/kitti/panoptic_deeplab/resnet50_os32_trainval.textproto --mode=eval --model_dir=/home/ubuntu/dev/sda1/model/model_yunjian/motion_deeplab/single_frame_ckpt --num_gpus=0'

while got the following error; Traceback (most recent call last): File "/home/ubuntu/yunjian/motion_deeplab/deeplab2/trainer/train.py", line 76, in app.run(main) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/absl/app.py", line 312, in run 2021-07-07 21:58:00.839820: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. _run_main(main, args) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/ubuntu/yunjian/motion_deeplab/deeplab2/trainer/train.py", line 72, in main FLAGS.num_gpus) File "/home/ubuntu/yunjian/motion_deeplab/deeplab2/trainer/train_lib.py", line 200, in run_experiment controller.evaluate(steps=config.evaluator_options.eval_steps) File "/home/ubuntu/yunjian/motion_deeplab/models/orbit/controller.py", line 282, in evaluate 2021-07-07 21:58:00.840081: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. eval_output = self.evaluator.evaluate(steps_tensor) File "/home/ubuntu/yunjian/motion_deeplab/models/orbit/standard_runner.py", line 344, in evaluate eval_iter, num_steps, state=outputs, reduce_fn=self.eval_reduce) File "/home/ubuntu/yunjian/motion_deeplab/models/orbit/utils/loop_fns.py", line 74, in loop_fn outputs = step_fn(iterator) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 780, in call 2021-07-07 21:58:00.840692: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. result = self._call(*args, **kwds) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call return self._stateless_fn(*args, **kwds) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2829, in call 2021-07-07 21:58:00.841029: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.841384: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call 2021-07-07 21:58:00.841919: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.842237: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. cancellation_manager=cancellation_manager) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat 2021-07-07 21:58:00.842551: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.842904: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. ctx, args, cancellation_manager=cancellation_manager)) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 550, in call 2021-07-07 21:58:00.843215: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. ctx=ctx) File "/home/ubuntu/anaconda3/envs/motion_deeplab/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. [[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext]] [Op:__inference_eval_step_11410]

Function call stack: eval_step

2021-07-07 21:58:00.843554: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.843883: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.844309: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.844679: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.844835: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.845372: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.845480: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.845700: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.846071: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.846429: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.846806: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.847188: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.847566: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.847957: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.848308: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.848663: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.849024: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.849314: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.849651: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.850048: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.850433: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.850796: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.851174: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.851542: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.851872: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.852309: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.852655: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.853005: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.853372: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.853777: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.854121: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.854466: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.854830: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.855111: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.855504: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.855864: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.856161: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.856533: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.856867: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.857240: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.857590: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.857939: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.858282: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.858656: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.859006: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.859322: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.859705: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.860125: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.860466: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.860779: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.861175: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.861490: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.861879: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.862221: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.862602: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.862925: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found. 2021-07-07 21:58:00.863304: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at example_parsing_ops.cc:94 : Invalid argument: Feature: image/segmentation/class/encoded (data type: string) is required but could not be found.

Process finished with exit code 1

The computer configuration is as follows： cuda=10.0.130 cudnn=7.6.5 tensorflow=2.3.0 while I run 'deeplab2/compile.sh gpu'，It returns 'Done with configuration!'.

opened by yjlin0223 6
Running Panoptic Segmentation on cityscapes using the "mobilenet_v3_large_os32.textproto" config file provided #102

System Details:

OS: Ubuntu 20.04

Python 3.9.12

Tensorflow 2.6.0

Note: After commenting out the following line "#assert backbone_options.use_squeeze_and_excite" in model/builder.py file started the training but I am getting following warning "Squeeze and Excitation is skipped due to undefined se_ratio"

Hello deeplab team,

**After running the following command:

python trainer/train.py --config_file=configs/cityscapes/panoptic_deeplab/mobilenet_v3_large_os32.textproto --mode=train --model_dir=Model_Output/ --num_gpus=1

I get this error log.**

2022-06-10 12:24:53.117800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.149730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.149850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I0610 12:24:53.150171 139902151967936 train.py:65] Reading the config file. I0610 12:24:53.152600 139902151967936 train.py:69] Starting the experiment. 2022-06-10 12:24:53.152945: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-06-10 12:24:53.154345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.154446: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.154517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-10 12:24:53.533883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 87 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6 I0610 12:24:53.535348 139902151967936 train_lib.py:104] Using strategy <class 'tensorflow.python.distribute.one_device_strategy.OneDeviceStrategy'> with 1 replicas I0610 12:24:53.540082 139902151967936 deeplab.py:57] Synchronized Batchnorm is used. Traceback (most recent call last): File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train.py", line 76, in app.run(main) File "/home/prabal/anaconda3/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/prabal/anaconda3/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train.py", line 71, in main train_lib.run_experiment(FLAGS.mode, config, combined_model_dir, FLAGS.master, File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train_lib.py", line 127, in run_experiment deeplab_model = create_deeplab_model( File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/trainer/train_lib.py", line 57, in create_deeplab_model return deeplab.DeepLab(config, dataset_descriptor) File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/model/deeplab.py", line 77, in init self._encoder = builder.create_encoder( File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/model/builder.py", line 57, in create_encoder return create_mobilenet_encoder( File "/home/prabal/MachineLearning/MLAlgorithms/SemanticSegmentation/deeplab2/model/builder.py", line 87, in create_mobilenet_encoder assert backbone_options.use_squeeze_and_excite AssertionError

opened by prabal27 5
ValueError: Dimensions must be equal

Hi, I've been trying to train the COCO dataset using Panoptic Deeplab along with resnet50_os32 config but getting the following error

ValueError: Dimensions must be equal, but are 641 and 161 for '{{node DeepLabFamilyLoss/TopKGeneralLoss/sub}} = Sub[T=DT_FLOAT](IteratorGetNext:2, DeepLab/Squeeze)' with input shapes: [64,641,641], [64,161,161].

Could you please help me point to the source of the issue?

opened by pseudo-swati 0
Code compatibility with python < 3.9

It seems that the files "deeplab2/model/layers/moat_blocks.py" and "deeplab2/model/pixel_encoder/moat.py" are using list[int] and List[int] interchangeably in function definitions. From what I can gather - this works in python 3.9 but not in previous versions. I was able to make the code run in python 3.8 by editing the files to use List[int], and Dict[int] and importing List and Dict from typing.

opened by brendonlutnick 0
Logits and scores of semantic prediction

The model output for the semantic predictions is in the format:

semantic logits : (batch, 81, 81, num_classes) (For a crop_size of 321) semantic scores : (batch, img_height, img_width, num_classes).

How to get the semantic logits in the exact shape of semantic scores?

opened by sonukiller 0
Op type not registered 'MergeSemanticAndInstanceMaps' in binary running on wvmgputprseus

I have exported panoptic_deeplab model. I'm getting following error while loading the model using tf.saved_model.load() NotFoundError: Op type not registered 'MergeSemanticAndInstanceMaps' in binary running on wvmgputprseus. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

opened by ShivaThe 0
[ViP-DeepLab] Add wSTQ in numpy for PVPS dataset.

This PR adds wSTQ implementation in numpy. It also adds the unit test to guarantee the compatibility of tf & numpy implementation. Fix a dtype error for tf impl.

opened by meijieru 5
Unstable numeric output for downstream task (moat 4 w/o pos)

ckpts : moat4 w/o pos

The output from moat4 can easily result in the following layers (e.g., 3x3 conv) having a NaN output.

The same issue, at least, does not show in moat0.

It is the first time I have met this issue in my career (I met NaN many times, but never like this), so I need to take some time to investigate this issue.

I will update this issue if I have a new finding. Please also check if the provided ckpts are working.

opened by edwardyehuang 1

DeepLab2: A TensorFlow Library for Deep Labeling

Related tags

Overview

DeepLab2: A TensorFlow Library for Deep Labeling

Installation

Dataset preparation

Projects

Colab Demo

Running DeepLab2

Change logs

Contacts (Maintainers)

Disclaimer

Citing DeepLab2

References

Comments

System Details:

OS: Ubuntu 20.04

Python 3.9.12

Tensorflow 2.6.0

Owner

Google Research

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

Empower Sequence Labeling with Task-Aware Language Model

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

labelpix is a graphical image labeling interface for drawing bounding boxes

Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

A practical ML pipeline for data labeling with experiment tracking using DVC.

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

Deep learning library featuring a higher-level API for TensorFlow.

Deep learning library featuring a higher-level API for TensorFlow.

QKeras: a quantization deep learning library for Tensorflow Keras

RetinaFace: Deep Face Detection Library in TensorFlow for Python