YoloV3 Implemented in Tensorflow 2.0

Zihao Zhang

Last update: Dec 26, 2022

Related tags

Deep Learning machine-learning deep-learning neural-network tensorflow tf2 yolo tensorflow-tutorials object-detection tensorflow-examples yolov3

Overview

YoloV3 Implemented in TensorFlow 2.0

This repo provides a clean implementation of YoloV3 in TensorFlow 2.0 using all the best practices.

Key Features

Usage

Installation

Conda (Recommended)

# Tensorflow CPU
conda env create -f conda-cpu.yml
conda activate yolov3-tf2-cpu

# Tensorflow GPU
conda env create -f conda-gpu.yml
conda activate yolov3-tf2-gpu

Pip

pip install -r requirements.txt

Nvidia Driver (For GPU)

# Ubuntu 18.04
sudo apt-add-repository -r ppa:graphics-drivers/ppa
sudo apt install nvidia-driver-430
# Windows/Other
https://www.nvidia.com/Download/index.aspx

Convert pre-trained Darknet weights

# yolov3
wget https://pjreddie.com/media/files/yolov3.weights -O data/yolov3.weights
python convert.py --weights ./data/yolov3.weights --output ./checkpoints/yolov3.tf

# yolov3-tiny
wget https://pjreddie.com/media/files/yolov3-tiny.weights -O data/yolov3-tiny.weights
python convert.py --weights ./data/yolov3-tiny.weights --output ./checkpoints/yolov3-tiny.tf --tiny

Detection

# yolov3
python detect.py --image ./data/meme.jpg

# yolov3-tiny
python detect.py --weights ./checkpoints/yolov3-tiny.tf --tiny --image ./data/street.jpg

# webcam
python detect_video.py --video 0

# video file
python detect_video.py --video path_to_file.mp4 --weights ./checkpoints/yolov3-tiny.tf --tiny

# video file with output
python detect_video.py --video path_to_file.mp4 --output ./output.avi

Training

I have created a complete tutorial on how to train from scratch using the VOC2012 Dataset. See the documentation here https://github.com/zzh8829/yolov3-tf2/blob/master/docs/training_voc.md

For customzied training, you need to generate tfrecord following the TensorFlow Object Detection API. For example you can use Microsoft VOTT to generate such dataset. You can also use this script to create the pascal voc dataset.

Example commend line arguments for training

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode eager_tf --transfer fine_tune

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode fit --transfer none

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode fit --transfer no_output

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 10 --mode eager_fit --transfer fine_tune --weights ./checkpoints/yolov3-tiny.tf --tiny

Tensorflow Serving

You can export the model to tf serving

python export_tfserving.py --output serving/yolov3/1/
# verify tfserving graph
saved_model_cli show --dir serving/yolov3/1/ --tag_set serve --signature_def serving_default

The inputs are preprocessed images (see dataset.transform_iamges)

outputs are

yolo_nms_0: bounding boxes
yolo_nms_1: scores
yolo_nms_2: classes
yolo_nms_3: numbers of valid detections

Benchmark (No Training Yet)

Numbers are obtained with rough calculations from detect_video.py

Macbook Pro 13 (2.7GHz i5)

Detection	416x416	320x320	608x608
YoloV3	1000ms	500ms	1546ms
YoloV3-Tiny	100ms	58ms	208ms

Desktop PC (GTX 970)

Detection	416x416	320x320	608x608
YoloV3	74ms	57ms	129ms
YoloV3-Tiny	18ms	15ms	28ms

AWS g3.4xlarge (Tesla M60)

Detection	416x416	320x320	608x608
YoloV3	66ms	50ms	123ms
YoloV3-Tiny	15ms	10ms	24ms

RTX 2070 (credit to @AnaRhisT94)

Detection	416x416
YoloV3 predict_on_batch	29-32ms
YoloV3 predict_on_batch + TensorRT	22-28ms

Darknet version of YoloV3 at 416x416 takes 29ms on Titan X. Considering Titan X has about double the benchmark of Tesla M60, Performance-wise this implementation is pretty comparable.

Implementation Details

Eager execution

Great addition for existing TensorFlow experts. Not very easy to use without some intermediate understanding of TensorFlow graphs. It is annoying when you accidentally use incompatible features like tensor.shape[0] or some sort of python control flow that works fine in eager mode, but totally breaks down when you try to compile the model to graph.

model(x) vs. model.predict(x)

When calling model(x) directly, we are executing the graph in eager mode. For model.predict, tf actually compiles the graph on the first run and then execute in graph mode. So if you are only running the model once, model(x) is faster since there is no compilation needed. Otherwise, model.predict or using exported SavedModel graph is much faster (by 2x). For non real-time usage, model.predict_on_batch is even faster as tested by @AnaRhisT94)

GradientTape

Extremely useful for debugging purpose, you can set breakpoints anywhere. You can compile all the keras fitting functionalities with gradient tape using the run_eagerly argument in model.compile. From my limited testing, all training methods including GradientTape, keras.fit, eager or not yeilds similar performance. But graph mode is still preferred since it's a tiny bit more efficient.

@tf.function

@tf.function is very cool. It's like an in-between version of eager and graph. You can step through the function by disabling tf.function and then gain performance when you enable it in production. Important note, you should not pass any non-tensor parameter to @tf.function, it will cause re-compilation on every call. I am not sure whats the best way other than using globals.

absl.py (abseil)

Absolutely amazing. If you don't know already, absl.py is officially used by internal projects at Google. It standardizes application interface for Python and many other languages. After using it within Google, I was so excited to hear abseil going open source. It includes many decades of best practices learned from creating large size scalable applications. I literally have nothing bad to say about it, strongly recommend absl.py to everybody.

Loading pre-trained Darknet weights

very hard with pure functional API because the layer ordering is different in tf.keras and darknet. The clean solution here is creating sub-models in keras. Keras is not able to save nested model in h5 format properly, TF Checkpoint is recommended since its offically supported by TensorFlow.

tf.keras.layers.BatchNormalization

It doesn't work very well for transfer learning. There are many articles and github issues all over the internet. I used a simple hack to make it work nicer on transfer learning with small batches.

What is the output of transform_targets ???

I know it's very confusion but the output is tuple of shape

(
  [N, 13, 13, 3, 6],
  [N, 26, 26, 3, 6],
  [N, 52, 52, 3, 6]
)

where N is the number of labels in batch and the last dimension "6" represents [x, y, w, h, obj, class] of the bounding boxes.

IOU and Score Threshold

the default threshold is 0.5 for both IOU and score, you can adjust them according to your need by setting --yolo_iou_threshold and --yolo_score_threshold flags

Maximum number of boxes

By default there can be maximum 100 bounding boxes per image, if for some reason you would like to have more boxes you can use the --yolo_max_boxes flag.

NAN Loss / Training Failed / Doesn't Converge

Many people including me have succeeded in training, so the code definitely works @LongxingTan in https://github.com/zzh8829/yolov3-tf2/issues/128 provided some of his insights summarized here:

For nan loss, try to make learning rate smaller
Double check the format of your input data. Data input labelled by vott and labelImg is different. so make sure the input box is the right, and check carefully the format is x1/width,y1/height,x2/width,y2/height and NOT x1,y1,x2,y2, or x,y,w,h

Make sure to visualize your custom dataset using this tool

python tools/visualize_dataset.py --classes=./data/voc2012.names

It will output one random image from your dataset with label to output.jpg Training definitely won't work if the rendered label doesn't look correct

Command Line Args Reference

convert.py:
  --output: path to output
    (default: './checkpoints/yolov3.tf')
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './data/yolov3.weights')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)

detect.py:
  --classes: path to classes file
    (default: './data/coco.names')
  --image: path to input image
    (default: './data/girl.png')
  --output: path to output image
    (default: './output.jpg')
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './checkpoints/yolov3.tf')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)

detect_video.py:
  --classes: path to classes file
    (default: './data/coco.names')
  --video: path to input video (use 0 for cam)
    (default: './data/video.mp4')
  --output: path to output video (remember to set right codec for given format. e.g. XVID for .avi)
    (default: None)
  --output_format: codec used in VideoWriter when saving video to file
    (default: 'XVID)
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './checkpoints/yolov3.tf')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)

train.py:
  --batch_size: batch size
    (default: '8')
    (an integer)
  --classes: path to classes file
    (default: './data/coco.names')
  --dataset: path to dataset
    (default: '')
  --epochs: number of epochs
    (default: '2')
    (an integer)
  --learning_rate: learning rate
    (default: '0.001')
    (a number)
  --mode: <fit|eager_fit|eager_tf>: fit: model.fit, eager_fit: model.fit(run_eagerly=True), eager_tf: custom GradientTape
    (default: 'fit')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)
  --size: image size
    (default: '416')
    (an integer)
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --transfer: <none|darknet|no_output|frozen|fine_tune>: none: Training from scratch, darknet: Transfer darknet, no_output: Transfer all but output, frozen: Transfer and freeze all,
    fine_tune: Transfer all and freeze darknet only
    (default: 'none')
  --val_dataset: path to validation dataset
    (default: '')
  --weights: path to weights file
    (default: './checkpoints/yolov3.tf')

Change Log

October 1, 2019

Updated to Tensorflow to v2.0.0 Release

References

It is pretty much impossible to implement this from the yolov3 paper alone. I had to reference the official (very hard to understand) and many un-official (many minor errors) repos to piece together the complete picture.

https://github.com/pjreddie/darknet
- official yolov3 implementation
https://github.com/AlexeyAB
- explinations of parameters
https://github.com/qqwweee/keras-yolo3
- models
- loss functions
https://github.com/YunYang1994/tensorflow-yolov3
- data transformations
- loss functions
https://github.com/ayooshkathuria/pytorch-yolo-v3
- models
https://github.com/broadinstitute/keras-resnet
- batch normalization fix

Comments

Can't start training.

I get the following error. I am not sure what I need to do to fix my tf record files.

2019-06-10 21:27:25.351566: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
2019-06-10 21:27:25.351650: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
	 [[{{node ParseSingleExample/ParseSingleExample}}]]
Traceback (most recent call last):
  File "train.py", line 178, in <module>
    app.run(main)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/absl/app.py", line 300, in run
2019-06-10 21:27:25.351867: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
    _run_main(main, args)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 173, in main
    validation_data=val_dataset)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 791, in fit
    initial_epoch=initial_epoch)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1515, in fit_generator
    steps_name='steps_per_epoch')
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 213, in model_iteration
    batch_data = _get_next_batch(generator, mode)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 355, in _get_next_batch
    generator_output = next(generator)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 556, in __next__
    return self.next()
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 585, in next
    return self._next_internal()
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 577, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1954, in iterator_get_next_sync
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image/key/sha256 (data type: string) is required but could not be found.
	 [[{{node ParseSingleExample/ParseSingleExample}}]] [Op:IteratorGetNextSync]

training inference

opened by shaunm 32

Is there anyone successfully trained the code with any dataset?

I want to validate the code by training VOC2007 and then write code with half-copy ,but my model can detect nothing,what's on earth the reason? ①use TF Object Detection API to create tf record ②train commond line is "python train.py --batch_size 8 --dataset xxx --val_dataset xxx --mode eager_tf --num_classes 20" ③detect cmd line is "python detect.py --weights ./xxx.tf"
training inference

opened by lazerliu 31
train.py nan values

pthon train.py --batch_size 8 --dataset train.tfrecord --val_dataset test.tfrecord --epochs 10 --mode eager_tf --transfer fine_tune --weights yolov3-tiny.tf --tiny

2019-11-18 17:22:38.548319: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll 2019-11-18 17:22:42.387889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll 2019-11-18 17:22:43.440550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733 pciBusID: 0000:01:00.0 2019-11-18 17:22:43.447248: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-11-18 17:22:43.452289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-11-18 17:22:43.455984: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-11-18 17:22:43.463149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733 pciBusID: 0000:01:00.0 2019-11-18 17:22:43.469160: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-11-18 17:22:43.473584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-11-18 17:22:44.144084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-11-18 17:22:44.149681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-11-18 17:22:44.153899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-11-18 17:22:44.158184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4708 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) 2019-11-18 17:23:00.009183: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:143] Filling up shuffle buffer (this may take a while): 203 of 1024 2019-11-18 17:23:05.496208: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:193] Shuffle buffer filled. 2019-11-18 17:23:05.950491: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2019-11-18 17:23:06.784628: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once. 2019-11-18 17:23:07.185721: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll I1118 17:23:08.219413 23336 train.py:135] 1_train_0, nan, [nan, nan] I1118 17:23:08.446805 23336 train.py:135] 1_train_1, nan, [nan, nan] I1118 17:23:08.677870 23336 train.py:135] 1_train_2, nan, [nan, nan] ...

Has anyone had this issue and managed to fix it? I labeled my images in Microsoft VoTT, then exported the images to TFRecords. After this I created train and test sets by adding them to their respective tf.data.TFRecordDataset, which are the train.tfrecord and test.tfrecord files.
bug training

opened by JeffStodd 27

Cannot convert custom darknet model

I'm trying to convert my darknet weights to tensorflow weights using the command python convert.py --weights /path/to/weights --output ./checkpoints/yolo-obj.tf

And what I get is this error message:

File "convert.py", line 33, in <module>
    app.run(main)
  File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "convert.py", line 20, in main
    load_darknet_weights(yolo, FLAGS.weights, FLAGS.tiny)
  File "/home/raulberari/yolov3-tf2/yolov3_tf2/utils.py", line 66, in load_darknet_weights
    conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 42732 into shape (256,128,3,3)

This happens after I0801 10:46:50.183817 139702532433664 utils.py:45] yolo_output_2/conv2d_73 bn

Does anyone have an explanation for this? I'm running this in the given env, yolov3-tf2 on an Ubuntu machine.

opened by raulberari 25

small lesson about problems during train my own dataset
thanks zzh8829 for the code sharing, really nice writing, I like it when i use it to try training my own dataset, i have some problems, that's how i solve them. hope this could save some time for others.

nan loss

nan loss, i first change the learning rate smaller

i found that the data input labelled by vott and labelImg is different. so make sure the input box is right (without nan and the box is smaller than the width and height), and check carefully the box format is x1,y1,x2,y2, or x,y,w,h, or x1/width,y1/height,x2/width,y2/height

loss is unbelievable large

the first step loss is ok. but after 2nd step, the loss is very large and can't converge any more. i change the backbone part according to other repositories of yolov3. and it solves

hard to convergence

remove the sigmoid operator of class_prob_loss

add the conf_focal=tf.pow(true_obj-pred_obj , 2) as a multiplier in confidence_loss

Resize the image by resize or pad

I checked the process to train VOC2012. if you use the voc2012.py to save the tf-record, there is no problem. In object detection, if you resize the image with Pad, then you have to pad the labelled box at the same time. But if you use resize function in cv or tf, and the label is relative (0,1), then no necessary to adjust it.

training
opened by LongxingTan 23
Unable to train Custom model

Hi,

I have been trying to train a custom model with 2 classes and I have modified the training script to do a transfer knowledge from the trained yolo model. This is my train.py script (See attached file train.txt

But unfortunately I can't make it work, it always fails with WARNING:tensorflow:Reduce LR on plateau conditioned on metricval_losswhich is not available. Available metrics are: lr W1002 12:07:19.268314 4404483520 callbacks.py:1824] Reduce LR on plateau conditioned on metricval_losswhich is not available. Available metrics are: lr WARNING:tensorflow:Early stopping conditioned on metricval_losswhich is not available. Available metrics are: W1002 12:07:19.268509 4404483520 callbacks.py:1250] Early stopping conditioned on metricval_losswhich is not available. Available metrics are: 1/Unknown - 8s 8s/stepTraceback (most recent call last): File "/Users/t230418/Downloads/TensorFlow2/train.py", line 187, in <module> app.run(main) File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/Users/t230418/Downloads/TensorFlow2/train.py", line 182, in main validation_data=val_dataset) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit total_epochs=epochs) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch batch_outs = execution_function(iterator) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function distributed_function(input_fn)) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 73, in distributed_function per_replica_function, args=(model, x, y, sample_weights)) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 760, in experimental_run_v2 return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1787, in call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 2132, in _call_for_each_replica return fn(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 258, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 264, in train_on_batch output_loss_metrics=model._output_loss_metrics) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 311, in train_on_batch output_loss_metrics=output_loss_metrics)) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 252, in _process_single_batch training=training)) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 166, in _model_loss per_sample_losses = loss_fn.call(targets[i], outs[i]) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 221, in call return self.fn(y_true, y_pred, **self._fn_kwargs) File "/Users/t230418/Downloads/TensorFlow2/yolov3_tf2/models.py", line 304, in yolo_loss true_class_idx, pred_class) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 978, in sparse_categorical_crossentropy y_true, y_pred, from_logits=from_logits, axis=axis) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 4549, in sparse_categorical_crossentropy labels=target, logits=output) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3477, in sparse_softmax_cross_entropy_with_logits_v2 labels=labels, logits=logits, name=name) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3397, in sparse_softmax_cross_entropy_with_logits precise_logits, labels, name=name) File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 11838, in sparse_softmax_cross_entropy_with_logits _six.raise_from(_core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -1 which is outside the valid range of [0, 2). Label values: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [Op:SparseSoftmaxCrossEntropyWithLogits] WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-6 W1002 12:07:19.918585 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-6 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-7 W1002 12:07:19.918745 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-7 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8 W1002 12:07:19.918799 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-8 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-6.arguments W1002 12:07:19.918851 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-6.arguments WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-6._variable_dict W1002 12:07:19.918897 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-6._variable_dict WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-6._trainable_weights W1002 12:07:19.918942 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-6._trainable_weights WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-6._non_trainable_weights W1002 12:07:19.918987 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-6._non_trainable_weights WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-7.arguments W1002 12:07:19.919031 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-7.arguments WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-7._variable_dict W1002 12:07:19.919076 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-7._variable_dict WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-7._trainable_weights W1002 12:07:19.919139 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-7._trainable_weights WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-7._non_trainable_weights W1002 12:07:19.919196 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-7._non_trainable_weights WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8.arguments W1002 12:07:19.919239 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-8.arguments WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8._variable_dict W1002 12:07:19.919282 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-8._variable_dict WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8._trainable_weights W1002 12:07:19.919326 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-8._trainable_weights WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8._non_trainable_weights W1002 12:07:19.919369 4404483520 util.py:144] Unresolved object in checkpoint: (root).layer-8._non_trainable_weights WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details. W1002 12:07:19.919421 4404483520 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.

Any ideas?
training

opened by jllarraz 17
Transfer Learning for custom class number

How can we do any sort of transfer learning on our own dataset with number of classes other than 80? In my case training from scratch doesn't give that great result. Transfer darknet seems to transfer even the yolo layer where classes have been taken into consideration.
training

opened by Pari-singh 17

Problem with loss = nan during training

Hello, I've been trying to get a custom dataset (11k training images and 1k validation images) trained with this model but I always get a loss which is nan after a while.

Eager_tf mode:

I0326 12:39:25.933575 140234859489088 train.py:183] 1_train_84, 223747.34375, [15194.2705, 36603.82, 171938.5]
I0326 12:39:27.360369 140234859489088 train.py:183] 1_train_85, 223252.890625, [15160.514, 36423.156, 171658.48]
I0326 12:39:28.780844 140234859489088 train.py:183] 1_train_86, 223706.375, [15277.884, 36469.22, 171948.52]
I0326 12:39:30.230567 140234859489088 train.py:183] 1_train_87, 223358.9375, [15199.5625, 36386.203, 171762.4]
I0326 12:39:31.832662 140234859489088 train.py:183] 1_train_88, 223327.765625, [15265.209, 36317.28, 171734.5]
I0326 12:39:33.261120 140234859489088 train.py:183] 1_train_89, nan, [nan, 36458.277, 171607.92]
I0326 12:39:34.778065 140234859489088 train.py:183] 1_train_90, nan, [nan, nan, nan]
I0326 12:39:36.280590 140234859489088 train.py:183] 1_train_91, nan, [nan, nan, nan]

I also went and checked my tfrecords (which I create using a slightly edited voc2012.py) but I don't see anything wrong with the outputs:

features {
  feature {
    key: "image/encoded"
    value {
      bytes_list {
        value: "\377\330\377\340\000\020JFIF\000\001\001\000\000\001\000\001\000\000\377\333\000C\000\010\006\006\007\006\005\010\007\007\007\t\t\010\n\014\024\r\014\013\013\014\031\022\023\017\024\035\032\037\036\035\032\034\034 $.\' \",#\034\034(7),01444\037\'9=82<.342\377\333\000C\001\t\t\t\014\013\014\030\r\r\0302!\034!22222222222222222222222222222222222222222222222222\377\300\000\021\010\001\217\002X\003\001\"\000\002\021\001\003\021\001\377\304\000\037\000\000\001\005\00 EDITED THE REST OUT
      }
    }
  }
  feature {
    key: "image/filename"
    value {
      bytes_list {
        value: "koffer_02407.jpg"
      }
    }
  }
  feature {
    key: "image/format"
    value {
      bytes_list {
        value: "jpeg"
      }
    }
  }
  feature {
    key: "image/height"
    value {
      int64_list {
        value: 399
      }
    }
  }
  feature {
    key: "image/key/sha256"
    value {
      bytes_list {
        value: "84a55039a36c917657c916af1deb34548eee3cee6bf44f5eb59ee8cc5ffcfd23"
      }
    }
  }
  feature {
    key: "image/object/bbox/xmax"
    value {
      float_list {
        value: 0.9362444877624512
        value: 0.9292577505111694
      }
    }
  }
  feature {
    key: "image/object/bbox/xmin"
    value {
      float_list {
        value: 0.7266375422477722
        value: 0.74410480260849
      }
    }
  }
  feature {
    key: "image/object/bbox/ymax"
    value {
      float_list {
        value: 0.681598424911499
        value: 0.9622541666030884
      }
    }
  }
  feature {
    key: "image/object/bbox/ymin"
    value {
      float_list {
        value: 0.5006147623062134
        value: 0.7183197736740112
      }
    }
  }
  feature {
    key: "image/object/class/label"
    value {
      int64_list {
        value: 17
        value: 17
      }
    }
  }
  feature {
    key: "image/object/class/text"
    value {
      bytes_list {
        value: "koffer"
        value: "koffer"
      }
    }
  }
  feature {
    key: "image/object/difficult"
    value {
      int64_list {
        value: 0
        value: 0
      }
    }
  }
  feature {
    key: "image/object/truncated"
    value {
      int64_list {
        value: 0
        value: 0
      }
    }
  }
  feature {
    key: "image/object/view"
    value {
      bytes_list {
        value: "Unspecified"
        value: "Unspecified"
      }
    }
  }
  feature {
    key: "image/source_id"
    value {
      bytes_list {
        value: "koffer_02407.jpg"
      }
    }
  }
  feature {
    key: "image/width"
    value {
      int64_list {
        value: 600
      }
    }
  }
}

In my classes.names file I have 29 classes and 'koffer' is indeed on line 17. If need be I could also post the edited version of the voc2012 however I onnly edited the reading of files because my folder layout is a bit different from what's been posted here.

I've noticed that some xmin or ymin values can be really small e.g.:

feature {
    key: "image/object/bbox/xmin"
    value {
      float_list {
        value: 3.964704228565097e-05
      }
    }
  }

But I don't think that's the source of the problem. I've also checked with #128 but my annotations of the .xml files already have xmin, xmax, ymin and ymax that are correct (so not out of bounds or anything).

opened by JaroHelsen 13

Return y_true_out got all zeros

https://github.com/zzh8829/yolov3-tf2/blob/27a96bc42444b6412a73f13ae7446f01377496b4/yolov3_tf2/dataset.py#L42

Does this line code really works? I got all labeld data be zeros, I think it's caused by this line code. You returned y_true_out but after zeros initilization I can not found anywhere else assign any values to it.

opened by jinfagang 13
BaseCollectiveExecuter::StartAbort Out of range: End of Sequence

Hi,

While training Tiny Yolo on VOC dataset, in the end of each epoch I am getting the error "BaseCollectiveExecuter::StartAbort Out of range: End of Sequence". Also training early stops after 4 epochs. The terminal outputs are attached. Note that I am using ubuntu 18.04 with TF2.0.

Also while using eager mode training, all went right.
training

opened by muhammad-maaz-confiz 12
Questions about transfer learning and training loss = nan
Thanks for the code. I am doing transfer learning with the yolov3 tf2 model using my own dataset (only one custom class - outside coco). Does the transfer learning function work in my case?

When I put in everything and trained a new model, I got a loss = nan. Below is the log. Could you point me to the problem? Thanks!

57/Unknown - 54s 945ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2020-02-21 09:01:43.873721: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]]

2020-02-21 09:01:43.873761: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] [[loss/yolo_output_0_loss/Shape_1/_12]] 2020-02-21 09:01:51.458659: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 2020-02-21 09:01:51.459038: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] [[loss/yolo_output_1_loss/Shape_1/_14]] D:\software\conda_envs\tf2\lib\site-packages\tensorflow_core\python\keras\callbacks.py:1806: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) D:\software\conda_envs\tf2\lib\site-packages\tensorflow_core\python\keras\callbacks.py:1225: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best):

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf 57/57 [==============================] - 64s 1s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan - val_yolo_output_2_loss: nan Epoch 2/10 1/7 [===>..........................] - ETA: 30s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na2/7 [=======>......................] - ETA: 13s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na3/7 [===========>..................] - ETA: 8s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan4/7 [================>.............] - ETA: 4s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan5/7 [====================>.........] - ETA: 2s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan6/7 [========================>.....] - ETA: 1s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2020-02-21 09:02:25.001690: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 2020-02-21 09:02:25.001806: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] [[loss/yolo_output_0_loss/Shape_1/_12]] 2020-02-21 09:02:29.828009: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] 2020-02-21 09:02:29.828422: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence [[{{node IteratorGetNext}}]] [[loss/yolo_output_1_loss/Shape_1/_14]]

Epoch 00002: saving model to checkpoints/yolov3_train_2.tf 57/57 [==============================] - 38s 673ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan - val_yolo_output_2_loss: nan Epoch 3/10 1/7 [===>..........................] - ETA: 30s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na2/7 [=======>......................] - ETA: 14s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na3/7 [===========>..................] - ETA: 8s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan4/7 [================>.............] - ETA: 5s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan5/7 [====================>.........] - ETA: 2s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan6/7 [========================>.....] - ETA: 1s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan
opened by jackyvr 11
Bump tensorflow from 2.5.1 to 2.9.3
Bumps tensorflow from 2.5.1 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump tensorflow-gpu from 2.5.1 to 2.9.3
Bumps tensorflow-gpu from 2.5.1 to 2.9.3.

Release notes

Sourced from tensorflow-gpu's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow-gpu's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
How to create tfrecord for coco dataset.

I am looking to train this on coco dataset but how to create tfrecord for coco dataset. No code is given for coco tfrecord creation only for VOC tfrecord creation is given . can someone help me with this.

opened by NaveenVinayakS 1
Yolo loss binary_crossentropy version

change

TODO: use binary_crossentropy instead

class_loss = obj_mask * sparse_categorical_crossentropy( true_class_idx, pred_class)

to

true_class_onehot = tf.one_hot(tf.cast(true_class_idx, tf.int64), depth=classes, axis=-1) true_class_binary = tf.reshape(true_class_onehot, (tf.shape(y_true)[0], grid_size,grid_size,tf.shape(y_true)[3],-1, 1)) pred_class_binary = tf.reshape(pred_class, (tf.shape(y_true)[0], grid_size,grid_size,tf.shape(y_true)[3],-1, 1)) class_loss = obj_mask * tf.reduce_sum(binary_crossentropy(true_class_binary, pred_class_binary), axis=-1)

opened by ZXTFINAL 0
"Windows fatal exception: access violation" when run export_tflite.py

When I try to run export_tflite.py, it cannot work and shows:

I0223 16:11:15.828265 8204 export_tflite.py:48] tflite model loaded Windows fatal exception: access violation

Current thread 0x0000200c (most recent call first): File "F:\Anaconda3\envs\TF2.6\lib\site-packages\tensorflow\lite\python\interpreter.py", line 875 in invoke File "G:\python_project\AI\yolov3_test\export_tflite.py", line 63 in main File "F:\Anaconda3\envs\TF2.6\lib\site-packages\absl\app.py", line 258 in _run_main File "F:\Anaconda3\envs\TF2.6\lib\site-packages\absl\app.py", line 312 in run File "G:\python_project\AI\yolov3_test\export_tflite.py", line 70 in

Process finished with exit code -1073741819 (0xC0000005)

I try to debug, and it seems that the problem lies in the code "interpreter.invoke()", but I don't know how to fix it. Please help me.

opened by jharden13HOU 0