Semantic Segmentation Suite in TensorFlow

Overview

Semantic Segmentation Suite in TensorFlow

alt-text-10

News

What's New

  • This repo has been depricated and will no longer be handling issues. Feel free to use as is :)

Description

This repository serves as a Semantic Segmentation Suite. The goal is to easily be able to implement, train, and test new Semantic Segmentation models! Complete with the following:

  • Training and testing modes
  • Data augmentation
  • Several state-of-the-art models. Easily plug and play with different models
  • Able to use any dataset
  • Evaluation including precision, recall, f1 score, average accuracy, per-class accuracy, and mean IoU
  • Plotting of loss function and accuracy over epochs

Any suggestions to improve this repository, including any new segmentation models you would like to see are welcome!

You can also check out my Transfer Learning Suite.

Citing

If you find this repository useful, please consider citing it using a link to the repo :)

Frontends

The following feature extraction models are currently made available:

Models

The following segmentation models are currently made available:

Files and Directories

  • train.py: Training on the dataset of your choice. Default is CamVid

  • test.py: Testing on the dataset of your choice. Default is CamVid

  • predict.py: Use your newly trained model to run a prediction on a single image

  • helper.py: Quick helper functions for data preparation and visualization

  • utils.py: Utilities for printing, debugging, testing, and evaluation

  • models: Folder containing all model files. Use this to build your models, or use a pre-built one

  • CamVid: The CamVid datatset for Semantic Segmentation as a test bed. This is the 32 class version

  • checkpoints: Checkpoint files for each epoch during training

  • Test: Test results including images, per-class accuracies, precision, recall, and f1 score

Installation

This project has the following dependencies:

  • Numpy sudo pip install numpy

  • OpenCV Python sudo apt-get install python-opencv

  • TensorFlow sudo pip install --upgrade tensorflow-gpu

Usage

The only thing you have to do to get started is set up the folders in the following structure:

├── "dataset_name"                   
|   ├── train
|   ├── train_labels
|   ├── val
|   ├── val_labels
|   ├── test
|   ├── test_labels

Put a text file under the dataset directory called "class_dict.csv" which contains the list of classes along with the R, G, B colour labels to visualize the segmentation results. This kind of dictionairy is usually supplied with the dataset. Here is an example for the CamVid dataset:

name,r,g,b
Animal,64,128,64
Archway,192,0,128
Bicyclist,0,128, 192
Bridge,0, 128, 64
Building,128, 0, 0
Car,64, 0, 128
CartLuggagePram,64, 0, 192
Child,192, 128, 64
Column_Pole,192, 192, 128
Fence,64, 64, 128
LaneMkgsDriv,128, 0, 192
LaneMkgsNonDriv,192, 0, 64
Misc_Text,128, 128, 64
MotorcycleScooter,192, 0, 192
OtherMoving,128, 64, 64
ParkingBlock,64, 192, 128
Pedestrian,64, 64, 0
Road,128, 64, 128
RoadShoulder,128, 128, 192
Sidewalk,0, 0, 192
SignSymbol,192, 128, 128
Sky,128, 128, 128
SUVPickupTruck,64, 128,192
TrafficCone,0, 0, 64
TrafficLight,0, 64, 64
Train,192, 64, 128
Tree,128, 128, 0
Truck_Bus,192, 128, 192
Tunnel,64, 0, 64
VegetationMisc,192, 192, 0
Void,0, 0, 0
Wall,64, 192, 0

Note: If you are using any of the networks that rely on a pre-trained ResNet, then you will need to download the pre-trained weights using the provided script. These are currently: PSPNet, RefineNet, DeepLabV3, DeepLabV3+, GCN.

Then you can simply run train.py! Check out the optional command line arguments:

usage: train.py [-h] [--num_epochs NUM_EPOCHS]
                [--checkpoint_step CHECKPOINT_STEP]
                [--validation_step VALIDATION_STEP] [--image IMAGE]
                [--continue_training CONTINUE_TRAINING] [--dataset DATASET]
                [--crop_height CROP_HEIGHT] [--crop_width CROP_WIDTH]
                [--batch_size BATCH_SIZE] [--num_val_images NUM_VAL_IMAGES]
                [--h_flip H_FLIP] [--v_flip V_FLIP] [--brightness BRIGHTNESS]
                [--rotation ROTATION] [--model MODEL] [--frontend FRONTEND]

optional arguments:
  -h, --help            show this help message and exit
  --num_epochs NUM_EPOCHS
                        Number of epochs to train for
  --checkpoint_step CHECKPOINT_STEP
                        How often to save checkpoints (epochs)
  --validation_step VALIDATION_STEP
                        How often to perform validation (epochs)
  --image IMAGE         The image you want to predict on. Only valid in
                        "predict" mode.
  --continue_training CONTINUE_TRAINING
                        Whether to continue training from a checkpoint
  --dataset DATASET     Dataset you are using.
  --crop_height CROP_HEIGHT
                        Height of cropped input image to network
  --crop_width CROP_WIDTH
                        Width of cropped input image to network
  --batch_size BATCH_SIZE
                        Number of images in each batch
  --num_val_images NUM_VAL_IMAGES
                        The number of images to used for validations
  --h_flip H_FLIP       Whether to randomly flip the image horizontally for
                        data augmentation
  --v_flip V_FLIP       Whether to randomly flip the image vertically for data
                        augmentation
  --brightness BRIGHTNESS
                        Whether to randomly change the image brightness for
                        data augmentation. Specifies the max bightness change
                        as a factor between 0.0 and 1.0. For example, 0.1
                        represents a max brightness change of 10% (+-).
  --rotation ROTATION   Whether to randomly rotate the image for data
                        augmentation. Specifies the max rotation angle in
                        degrees.
  --model MODEL         The model you are using. See model_builder.py for
                        supported models
  --frontend FRONTEND   The frontend you are using. See frontend_builder.py
                        for supported models

Results

These are some sample results for the CamVid dataset with 11 classes (previous research version).

In training, I used a batch size of 1 and image size of 352x480. The following results are for the FC-DenseNet103 model trained for 300 epochs. I used RMSProp with learning rate 0.001 and decay 0.995. I did not use any data augmentation like in the paper. I also didn't use any class balancing. These are just some quick and dirty example results.

Note that the checkpoint files are not uploaded to this repository since they are too big for GitHub (greater than 100 MB)

Class Original Accuracy My Accuracy
Sky 93.0 94.1
Building 83.0 81.2
Pole 37.8 38.3
Road 94.5 97.5
Pavement 82.2 87.9
Tree 77.3 75.5
SignSymbol 43.9 49.7
Fence 37.1 69.0
Car 77.3 87.0
Pedestrian 59.6 60.3
Bicyclist 50.5 75.3
Unlabelled N/A 40.9
Global 91.5 89.6
Loss vs Epochs Val. Acc. vs Epochs
alt text-1 alt text-2
Original GT Result
alt-text-3 alt-text-4 alt-text-5
Comments
  • Am I doing something wrong? Current_Loss = 0.0000

    Am I doing something wrong? Current_Loss = 0.0000

    Information

    Please specify the following information when submitting an issue:

    • What are your command line arguments?: python train.py --dataset for_GeorgeSeif_training --crop_width 128 --crop_height 128 --h_flip false --v_flip false --model DeepLabV3 --frontend MobileNetV2
    • Have you written any custom code?: I have made modifications to util.py
    1. to allow load_image to load grayscale masks
    2. to give a warning, rather than an exception, if the image is smaller than the size to which it should be cropped in random_crop. (And then return the unaltered image and label)
    • What have you done to try and solve this issue?: Nothing, I am not sure if this is an issue or intended behaviour.
    • TensorFlow version?: 1.10.1

    Describe the problem

    The training routine seems to be running fine, but gives status updates that look like: [2018-09-05 12:20:49] Epoch = 0 Count = 20 Current_Loss = 0.0000 Time = 3.27 [2018-09-05 12:20:52] Epoch = 0 Count = 40 Current_Loss = 0.0000 Time = 2.86 [2018-09-05 12:20:54] Epoch = 0 Count = 60 Current_Loss = 0.0000 Time = 2.92 [2018-09-05 12:20:58] Epoch = 0 Count = 80 Current_Loss = 0.0000 Time = 3.17 etc.

    I expected that the Current_Loss would be non-zero, and that it would decrease as learning progresses. But perhaps I am misunderstanding something?

    Source code / logs

    n/a

    opened by chrisrapson 24
  • training time

    training time

    How long does it take to train this model? When i run this code, i got the following messages and the training process is extremely slow, why?

    The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-12-30 02:23:19.780078: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. This model has 9270812 trainable parameters ***** Begin training ***** [2017-12-30 02:31:59] Epoch = 0 Count = 20 Current = 2.52 Time = 25.51 [2017-12-30 02:40:30] Epoch = 0 Count = 40 Current = 2.36 Time = 25.65 [2017-12-30 02:49:02] Epoch = 0 Count = 60 Current = 2.20 Time = 25.60 [2017-12-30 02:57:29] Epoch = 0 Count = 80 Current = 2.02 Time = 25.34 [2017-12-30 03:05:56] Epoch = 0 Count = 100 Current = 1.81 Time = 25.45 [2017-12-30 03:14:23] Epoch = 0 Count = 120 Current = 1.42 Time = 25.31 [2017-12-30 03:22:50] Epoch = 0 Count = 140 Current = 1.61 Time = 25.34 [2017-12-30 03:31:18] Epoch = 0 Count = 160 Current = 1.49 Time = 25.29 [2017-12-30 03:39:44] Epoch = 0 Count = 180 Current = 1.65 Time = 25.30 [2017-12-30 03:48:13] Epoch = 0 Count = 200 Current = 1.42 Time = 25.48 [2017-12-30 03:56:43] Epoch = 0 Count = 220 Current = 1.50 Time = 25.55 [2017-12-30 04:05:14] Epoch = 0 Count = 240 Current = 1.60 Time = 25.81 [2017-12-30 04:13:48] Epoch = 0 Count = 260 Current = 1.46 Time = 25.56 [2017-12-30 04:22:18] Epoch = 0 Count = 280 Current = 1.89 Time = 25.88 [2017-12-30 04:30:49] Epoch = 0 Count = 300 Current = 1.46 Time = 25.55 [2017-12-30 04:39:22] Epoch = 0 Count = 320 Current = 1.32 Time = 25.33

    opened by lixiang-ucas 15
  •  Incompatible shapes: [1,512,512] vs. [2] in binary problem after activating --class_balancing

    Incompatible shapes: [1,512,512] vs. [2] in binary problem after activating --class_balancing

    1. Bug reports

    Information

    Please specify the following information when submitting an issue:

    • **What are your command line arguments?python main.py --num_epochs 50 --mode train --dataset AerialLane18_512 --model DeepLabV3_plus-Res50 --class_balancing True
    • Have you written any custom code?: No
    • TensorFlow version?: 1.8.0

    Describe the problem

    In a binary problem of segmenting lanes and nonlanes, activating --class_balancing throws the following error: `***** Begin training ***** Dataset --> AerialLane18_512 Model --> DeepLabV3_plus-Res50 Crop Height --> 512 Crop Width --> 512 Num Epochs --> 50 Batch Size --> 1 Num Classes --> 2 Data Augmentation: Vertical Flip --> False Horizontal Flip --> False Brightness Alteration --> None Rotation --> None

    Traceback (most recent call last): File "main.py", line 317, in _,current=sess.run([opt,loss],feed_dict={net_input:input_image_batch,net_output:output_image_batch}) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,512,512] vs. [2] [[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](softmax_cross_entropy_with_logits_sg/Reshape_2, mul/y)]] [[Node: Mean_1/_717 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3560_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

    Caused by op u'mul', defined at: File "main.py", line 202, in losses = unweighted_loss * class_weights File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 979, in binary_op_wrapper return func(x, y, name=name) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1211, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4759, in mul "Mul", x=x, y=y, name=name) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/azim_se/.virtualenvs/objdetTF2.7/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

    InvalidArgumentError (see above for traceback): Incompatible shapes: [1,512,512] vs. [2] [[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](softmax_cross_entropy_with_logits_sg/Reshape_2, mul/y)]] [[Node: Mean_1/_717 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3560_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]] `

    Source code / logs

    class_dict.csv name,r,g,b lane,128,50,100 nonlane,0, 0, 0

    opened by arasharchor 14
  • slow data loading

    slow data loading

    Data loading is not properly optimized and GPU is idle most of the time waiting for data. I found that bottleneck is one_hot_it. It is hopelessly slow because of the pixel-level operation in python, each involving a table lookup.

    On-the-fly color-to-class mapping is a very inefficient design. Such mapping should be done offline. At least an option should be provided to allow directly loading label images.

    In my case it's a binary classification problem, and after I fixed the one-hot-encoding with a hack I got a 6x speedup and my GPU is now above 50% busy when training.

    opened by aaalgo 12
  • Reproducibility of CamVid results

    Reproducibility of CamVid results

    George,

    I'm having problems reproducing the results for training CamVid. I am trying the follow with no luck. I attempt to predict after training and confirmed the prediction is incorrect also.

    TRAINING RESULTS: Validation precision = 0.49989 Validation recall = 0.512134 Validation F1 = 0.50587 Validation IoU = 0.01776

    TRAIN:
    python main.py --mode train --dataset CamVid --model PSPNet-Res50 --batch_size 100000 --num_epoch 300

    PREDICT: python main.py --mode predict --dataset CamVid --model PSPNet-Res50 --image trash/in.png

    opened by jeffreylutz 12
  • Help finding output node for creating frozen graph

    Help finding output node for creating frozen graph

    i'm trying to create a frozen graph of a model but the script requires the name of the output node. So i guess this is a request for help and a feature request. Do you know what the output node's name is? I couldn't find it in the code and i'm not familiar with tf.contrib.slim. And as a feature it would be nice to be able to export our model as a frozen inference graph:

    import os, argparse

    import tensorflow as tf

    def freeze_graph(model_dir, output_node_names): """Extract the sub graph defined by the output nodes and convert all its variables into constant Args: model_dir: the root folder containing the checkpoint state file output_node_names: a string, containing all the output node's names, comma separated """ if not tf.gfile.Exists(model_dir): raise AssertionError( "Export directory doesn't exists. Please specify an export " "directory: %s" % model_dir)

    if not output_node_names:
        print("You need to supply the name of a node to --output_node_names.")
        return -1
    
    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_dir)
    input_checkpoint = checkpoint.model_checkpoint_path
    
    # We precise the file fullname of our freezed graph
    absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_dir + "/frozen_model.pb"
    
    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True
    
    # We start a session using a temporary fresh Graph
    with tf.Session(graph=tf.Graph()) as sess:
        # We import the meta graph in the current default Graph
        saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)
    
        # We restore the weights
        saver.restore(sess, input_checkpoint)
    
        # We use a built-in TF helper to export variables to constants
        output_graph_def = tf.graph_util.convert_variables_to_constants(
            sess, # The session is used to retrieve the weights
            tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 
    
        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))
    
    return output_graph_def
    

    freeze_graph('checkpoints', 'outputnode???') '

    opened by ablacklama 10
  • Training loss begins at 0 and stuck on it

    Training loss begins at 0 and stuck on it

    I'm trying to do segmentation only on human and background class. I have modified the main.py such that the ground truth image dimension should be 256X256X1. I removed the one hot and reverse one hot encoding function calls because I have only one class and because of that the output image is always one hot encoded. But I see the from the beginning that the loss stuck at 0 and never increases.

    I didn't choose the class balancing loss function option. I tried changing the loss function to IOU. In this case the loss is not 0 anymore but the loss (negative iou) value becomes less than -1 which means something still wrong. I tried increasing the learning rate with no affect. I have checked input and ground truth images by visualizing but didn't see any problem on that.

    I removed all the augmentation. Only resized the images to 256x256 dimension before providing to the model.

    The prediction output is always an image having all the pixel set to 0. I have printed the predicted image values and the values are huge negative numbers. Like,

    -8136019.0, -8398163.0, -5907791.0, -10888527.0,

    I have edited the get_label_info() function so that the label info is not loaded from the csv. Instead I did the following,

    class_names = []
    label_values = []
    class_names.append('People')
    label_values.append([1,1,1])
    
    return class_names, label_values
    

    The console log with batch size 6, model DeepLabV3-Res50 and no augmentation

    This model has 27649921 trainable parameters
    Loading the data ...
    
    ***** Begin training *****
    Dataset --> /media/zayd/FUN/DL/Datasets/processed_Segmentation_dataset
    Model --> DeepLabV3-Res50
    Crop Height --> 256
    Crop Width --> 256
    Num Epochs --> 300
    Batch Size --> 6
    Num Classes --> 1
    Data Augmentation:
    	Vertical Flip --> False
    	Horizontal Flip --> False
    	Brightness Alteration --> None
    	Rotation --> None
    
    2018-04-23 20:33:31.648008: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.22GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2018-04-23 20:33:32.104951: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.21GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    [2018-04-23 20:33:33] Epoch = 0 Count = 60 Current_Loss = 0.0000 Time = 3.83
    [2018-04-23 20:33:34] Epoch = 0 Count = 120 Current_Loss = 0.0000 Time = 1.42
    [2018-04-23 20:33:36] Epoch = 0 Count = 180 Current_Loss = 0.0000 Time = 1.38
    [2018-04-23 20:33:37] Epoch = 0 Count = 240 Current_Loss = 0.0000 Time = 1.41
    [2018-04-23 20:33:39] Epoch = 0 Count = 300 Current_Loss = 0.0000 Time = 1.39
    [2018-04-23 20:33:40] Epoch = 0 Count = 360 Current_Loss = 0.0000 Time = 1.39
    [2018-04-23 20:33:41] Epoch = 0 Count = 420 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:43] Epoch = 0 Count = 480 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:44] Epoch = 0 Count = 540 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:45] Epoch = 0 Count = 600 Current_Loss = 0.0000 Time = 1.36
    [2018-04-23 20:33:47] Epoch = 0 Count = 660 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:48] Epoch = 0 Count = 720 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:50] Epoch = 0 Count = 780 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:51] Epoch = 0 Count = 840 Current_Loss = 0.0000 Time = 1.46
    [2018-04-23 20:33:52] Epoch = 0 Count = 900 Current_Loss = 0.0000 Time = 1.40
    [2018-04-23 20:33:54] Epoch = 0 Count = 960 Current_Loss = 0.0000 Time = 1.39
    [2018-04-23 20:33:55] Epoch = 0 Count = 1020 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:33:57] Epoch = 0 Count = 1080 Current_Loss = 0.0000 Time = 1.38
    [2018-04-23 20:33:58] Epoch = 0 Count = 1140 Current_Loss = 0.0000 Time = 1.41
    [2018-04-23 20:33:59] Epoch = 0 Count = 1200 Current_Loss = 0.0000 Time = 1.38
    [2018-04-23 20:34:01] Epoch = 0 Count = 1260 Current_Loss = 0.0000 Time = 1.42
    [2018-04-23 20:34:02] Epoch = 0 Count = 1320 Current_Loss = 0.0000 Time = 1.39
    [2018-04-23 20:34:04] Epoch = 0 Count = 1380 Current_Loss = 0.0000 Time = 1.40
    [2018-04-23 20:34:05] Epoch = 0 Count = 1440 Current_Loss = 0.0000 Time = 1.41
    [2018-04-23 20:34:06] Epoch = 0 Count = 1500 Current_Loss = 0.0000 Time = 1.44
    [2018-04-23 20:34:08] Epoch = 0 Count = 1560 Current_Loss = 0.0000 Time = 1.44
    [2018-04-23 20:34:09] Epoch = 0 Count = 1620 Current_Loss = 0.0000 Time = 1.38
    [2018-04-23 20:34:11] Epoch = 0 Count = 1680 Current_Loss = 0.0000 Time = 1.38
    [2018-04-23 20:34:12] Epoch = 0 Count = 1740 Current_Loss = 0.0000 Time = 1.38
    [2018-04-23 20:34:13] Epoch = 0 Count = 1800 Current_Loss = 0.0000 Time = 1.37
    [2018-04-23 20:34:15] Epoch = 0 Count = 1860 Current_Loss = 0.0000 Time = 1.38
    /usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
      'precision', 'predicted', average, warn_for)
    /usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1137: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples.
      'recall', 'true', average, warn_for)
    /usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
      'precision', 'predicted', average, warn_for)
    /usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1137: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
      'recall', 'true', average, warn_for)
    
    Average validation accuracy for epoch # 0000 = 0.000000
    Average per class validation accuracies for epoch # 0000:
    People = 0.000000
    Validation precision =  0.0
    Validation recall =  0.0
    Validation F1 score =  0.0
    Validation IoU score =  0.0
    
    

    Link to my main.py file, helpers.py and utils.py files. I changed utils.py to add debug logs only. Stuck on it for few days now.

    opened by calicratis19 10
  • TypeError: Expected binary or unicode string, got <built-in function input>

    TypeError: Expected binary or unicode string, got

    Hi, I'm trying to run main.py in an environment with the installed requirements, but I think there is need for the versions. I believe this is caused by me using a different Tensorflow version.

    tensorflow==1.7.0 tensorflow-gpu==1.2.1

    `Traceback (most recent call last): File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 518, in make_tensor_proto str_values = [compat.as_bytes(x) for x in proto_values] File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 518, in str_values = [compat.as_bytes(x) for x in proto_values] File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 68, in as_bytes (bytes_or_text,)) TypeError: Expected binary or unicode string, got

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "main.py", line 152, in network = build_fc_densenet(input, preset_model = args.model, num_classes=num_classes) File "models/FC_DenseNet_Tiramisu.py", line 112, in build_fc_densenet stack = slim.conv2d(inputs, n_filters_first_conv, [3, 3], scope='first_conv', activation_fn=None) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args return func(*args, **current_args) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1015, in convolution inputs = ops.convert_to_tensor(inputs) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor as_ref=False) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 214, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/home/alansalinas/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 522, in make_tensor_proto "supported type." % (type(values), values)) TypeError: Failed to convert object of type <class 'builtin_function_or_method'> to Tensor. Contents: . Consider casting elements to a supported type.`

    opened by alansalinas 9
  • Fine-tuning

    Fine-tuning

    Hello,

    How can fine-tune a model on a custom dataset, with possibly a different number of classes using a pre-trained weight?

    Second, which annotation style we should follow? The annotation for each datase is different, for example VOC uses white borders of dont-care =255 which others dont.

    opened by MyVanitar 9
  • When using class balance

    When using class balance

    When I try class_balance,I got this error.don't know how to solve it

    Preparing the model ... Computing class weights for CamVid ... Processing image: 420 / 421Traceback (most recent call last): File "/home/ye/user/yejg/SW_DATA/Or/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1567, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 31 and 32 for 'mul' (op: 'Mul') with input shapes: [31], [?,?,?,32].

    During handling of the above exception, another exception occurred:

    opened by yejg2017 8
  • The Loss Increase Strangely

    The Loss Increase Strangely


    Information

    Please specify the following information when submitting an issue:

    • **What are your command line arguments?:**python train.py --dataset Myown --crop_height 960 --crop_width 960 --model PSPNet --frontend ResNet50
    • Have you written any custom code?: I adjusted the dataset to my own data. And I changed the file class_dict.csv. The other parameters as default. (lr=0.0001, decay=0.995)
    • **What have you done to try and solve this issue?:**No idea. I am not sure what is wrong.
    • TensorFlow version?:1.2.0 CUDA=8.0 Cudnn=5.1

    Describe the problem

    Hi, I just trained on my own dataset with 130 epoch. The training didn't fail. But during the training, the average loss kept increasing. And the validation IoU is very unstable. The result as follows.

    Moreover, my own dataset has 6 classes, including the background. And the training set has 880 images, the testing set has 200 images. Now I am really confused. Am I doing something wrong? Please help me solve it. Thank you all in advance.

    image image image

    Source code / logs

    Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

    [2018-10-17 18:27:09] Remaining training time = 26 hours 24 minutes 29 seconds

    [2018-10-17 18:27:22] Epoch = 130 Count = 20 Current_Loss = 0.0786 Time = 11.95 [2018-10-17 18:27:34] Epoch = 130 Count = 40 Current_Loss = 0.0495 Time = 11.95 [2018-10-17 18:27:45] Epoch = 130 Count = 60 Current_Loss = 16.9121 Time = 11.98 [2018-10-17 18:27:57] Epoch = 130 Count = 80 Current_Loss = 0.3031 Time = 11.95 [2018-10-17 18:28:09] Epoch = 130 Count = 100 Current_Loss = 0.0254 Time = 11.97 [2018-10-17 18:28:21] Epoch = 130 Count = 120 Current_Loss = 13.6251 Time = 11.93 [2018-10-17 18:28:33] Epoch = 130 Count = 140 Current_Loss = 2.3520 Time = 11.96 [2018-10-17 18:28:45] Epoch = 130 Count = 160 Current_Loss = 0.0803 Time = 11.96 [2018-10-17 18:28:57] Epoch = 130 Count = 180 Current_Loss = 0.0197 Time = 11.95 [2018-10-17 18:29:09] Epoch = 130 Count = 200 Current_Loss = 0.0311 Time = 11.98 [2018-10-17 18:29:21] Epoch = 130 Count = 220 Current_Loss = 16.3048 Time = 11.97 [2018-10-17 18:29:33] Epoch = 130 Count = 240 Current_Loss = 102.0692 Time = 11.99 [2018-10-17 18:29:45] Epoch = 130 Count = 260 Current_Loss = 0.0175 Time = 12.01 [2018-10-17 18:29:57] Epoch = 130 Count = 280 Current_Loss = 115.6432 Time = 11.96 [2018-10-17 18:30:09] Epoch = 130 Count = 300 Current_Loss = 0.0895 Time = 12.01 [2018-10-17 18:30:21] Epoch = 130 Count = 320 Current_Loss = 0.4005 Time = 12.01 [2018-10-17 18:30:33] Epoch = 130 Count = 340 Current_Loss = 0.0322 Time = 12.00 [2018-10-17 18:30:45] Epoch = 130 Count = 360 Current_Loss = 0.6240 Time = 11.98 [2018-10-17 18:30:57] Epoch = 130 Count = 380 Current_Loss = 2.2971 Time = 11.96 [2018-10-17 18:31:09] Epoch = 130 Count = 400 Current_Loss = 4.7438 Time = 11.96 [2018-10-17 18:31:21] Epoch = 130 Count = 420 Current_Loss = 0.1266 Time = 11.99 [2018-10-17 18:31:33] Epoch = 130 Count = 440 Current_Loss = 5.0472 Time = 12.03 [2018-10-17 18:31:45] Epoch = 130 Count = 460 Current_Loss = 3.7820 Time = 11.99 [2018-10-17 18:31:57] Epoch = 130 Count = 480 Current_Loss = 0.0595 Time = 11.94 [2018-10-17 18:32:09] Epoch = 130 Count = 500 Current_Loss = 2.3438 Time = 11.99 [2018-10-17 18:32:21] Epoch = 130 Count = 520 Current_Loss = 16.4954 Time = 11.97 [2018-10-17 18:32:33] Epoch = 130 Count = 540 Current_Loss = 0.3656 Time = 11.96 [2018-10-17 18:32:45] Epoch = 130 Count = 560 Current_Loss = 1.8269 Time = 11.95 [2018-10-17 18:32:57] Epoch = 130 Count = 580 Current_Loss = 4.8849 Time = 11.97 [2018-10-17 18:33:09] Epoch = 130 Count = 600 Current_Loss = 0.2584 Time = 11.93 [2018-10-17 18:33:21] Epoch = 130 Count = 620 Current_Loss = 0.3795 Time = 11.97 [2018-10-17 18:33:33] Epoch = 130 Count = 640 Current_Loss = 0.1069 Time = 11.97 [2018-10-17 18:33:45] Epoch = 130 Count = 660 Current_Loss = 1.8307 Time = 11.98 [2018-10-17 18:33:57] Epoch = 130 Count = 680 Current_Loss = 0.0496 Time = 11.92 [2018-10-17 18:34:09] Epoch = 130 Count = 700 Current_Loss = 0.4369 Time = 11.99 [2018-10-17 18:34:21] Epoch = 130 Count = 720 Current_Loss = 0.0194 Time = 11.98 [2018-10-17 18:34:33] Epoch = 130 Count = 740 Current_Loss = 2.5540 Time = 11.99 [2018-10-17 18:34:45] Epoch = 130 Count = 760 Current_Loss = 1.3295 Time = 11.96 [2018-10-17 18:34:56] Epoch = 130 Count = 780 Current_Loss = 0.0537 Time = 11.98 [2018-10-17 18:35:08] Epoch = 130 Count = 800 Current_Loss = 27.3824 Time = 11.97 [2018-10-17 18:35:20] Epoch = 130 Count = 820 Current_Loss = 4.8048 Time = 11.95 [2018-10-17 18:35:32] Epoch = 130 Count = 840 Current_Loss = 0.0680 Time = 11.94 [2018-10-17 18:35:44] Epoch = 130 Count = 860 Current_Loss = 2.4314 Time = 12.01 [2018-10-17 18:35:56] Epoch = 130 Count = 880 Current_Loss = 10.6609 Time = 11.96 Saving latest checkpoint Saving checkpoint for this epoch Performing validation

    opened by qwenqw 7
  • ModuleNotFoundError: No module named 'tensorflow.contrib'

    ModuleNotFoundError: No module named 'tensorflow.contrib'

    Please fill out this issue template before submitting. Issues which do not fill out this template, or are already answered in the FAQs will simply be closed.

    Please go to Stack Overflow for help and support. Also check past issues as many are repeats. Also check out the Frequently Asked Questions (FAQs) below in case your question has already been answered in an issue!

    Issues should be one of the following:

    1. Feature requests
    2. Bug reports

    Information

    Please specify the following information when submitting an issue:

    • What are your command line arguments?:
    • Have you written any custom code?:
    • What have you done to try and solve this issue?:
    • TensorFlow version?:

    Describe the problem

    Describe the problem clearly here. Be sure to convey here why it's a bug or a feature request.

    Source code / logs

    Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

    FAQs

    • Question: I got an InvalidArgumentError saying that Dimensions of inputs should match Answer: See issue #17

    • Question: Can you upload pre-trained weights for these networks? Answer: See issue #57

    • Question: Do I need a GPU to train these models? Answer: Technically no, but I'd highly recommend it. I was able to train the models pretty well in about a day using a 1080Ti GPU. Training on CPU would take much longer than that.

    • Question: Will you be adding the FCN or U-Net models? Answer: No I won't be adding those simply because they're a few years old and state-of-the-art has moved past that.

    • Question: I got an invalid argument error when using the InceptionV4 model. Am I doing something wrong? Answer: No you're not! Due to the design of the InceptiveV4 model, when you end up upsampling you do some rounding which creates a shape mismatch. This only happens when you end up having to use the end_points['pool5']. See the code for some of the models if you want to check whether the model will use end_points['pool5'].

    opened by hailuu684 0
  • Plotting Error While Doing Continue Training

    Plotting Error While Doing Continue Training

    Hi, I have been working on this repository for learning purposes. I found it helpful. After training on different models, I wanted to try the argument of Continue_Training. So I set the continue_training to True and epoch_start_i to 300. As I previously trained for 300 epochs and a total number of epochs to 310. But, after one epoch, it shows some error in plotting in the arguments of: ax1.plot(range(epoch+1), avg_scores_per_epoch)

    valueError: x and y must have same first dimension, but have shapes (300,) and (1,)

    I guess, it's because of avg score initialization from an empty array. Can you kindly look into it.

    opened by hmza09 1
  • is crop size same with image size for production ?

    is crop size same with image size for production ?

    Kindly need explanation, I have images 1280 720 pixels for training and expected same size for production (implementation). How to configure crop size ? should I use 1280 720 ? or any crop size can be used during training and can be applied for 1280 720 images in the implementation ?

    thank you

    opened by ramdhan1989 1
  • Batch size changes output with same images

    Batch size changes output with same images

    Bug report

    Information

    Please specify the following information when submitting an issue:

    • What are your command line arguments?: Command line args: CUDA_VISIBLE_DEVICES=0 python -m pdb train.py --num_epochs 301 --continue_training false --dataset dataset --crop_height 352 --crop_width 480 --batch_size 4 --num_val_images 100 --model DeepLabV3_plus --frontend ResNet50

    • Have you written any custom code?: I removed data augmentation by adding "return input_image, output_image" right at the beginning and removing an empty line to not change where other lines are later for breakpoints. I also tried both with is_training=False and is_training=True.

    • What have you done to try and solve this issue?: Googled why this might happen. Tried other models.

    • TensorFlow version?: '1.13.1'

    Describe the problem

    When calling sess.run the output will be different with the same images depending on the size of the batch they were included in.

    Source code / logs

    Running in pdb, this can be done with a fresh checkout to replicate the problem. I originally found it when trying to implement batch inference into predict.py but I doing this in train.py is the quickest way for you to reproduce the problem.
    (Pdb) break train.py:197 ... (Pdb) output_image_last = sess.run(network,feed_dict={net_input:np.expand_dims(input_image, axis=0)}) (Pdb) output_images = sess.run(network,feed_dict={net_input:input_image_batch}) (Pdb) (input_image - input_image_batch[3]).max() 0.0 (Pdb) (output_image_last - output_images[3]).max() 1.0644385

    The following is another set of commands I tested from the breakpoint at 197 if you want to copy paste quickly, for these you must remove data augmentation. These commands setup a batch within pdb of size 2 and 4 and just tries to generally test that the same input images produce different outputs depending on batch size.

    output_image_last_alone = sess.run(network,feed_dict={net_input:np.expand_dims(input_image, axis=0)}) output_images_orig4 = sess.run(network,feed_dict={net_input:input_image_batch})

    input_image_batch_manual2 = []

    index = i * args.batch_size + j-1 id = id_list[index] input_image2 = utils.load_image(train_input_names[id]) output_image2 = utils.load_image(train_output_names[id])

    index = i * args.batch_size + j id = id_list[index] input_image3 = utils.load_image(train_input_names[id]) output_image3 = utils.load_image(train_output_names[id]) input_image2, output_image2 = data_augmentation(input_image2, output_image2) input_image3, output_image3 = data_augmentation(input_image3, output_image3) input_image2 = np.float32(input_image2) / 255.0 input_image3 = np.float32(input_image3) / 255.0 input_image_batch_manual2.append(np.expand_dims(input_image2, axis=0)) input_image_batch_manual2.append(np.expand_dims(input_image3, axis=0)) input_image_batch_manual2 = np.squeeze(np.stack(input_image_batch_manual2, axis=1)) output_images_batch2 = sess.run(network,feed_dict={net_input:input_image_batch_manual2})

    input_image_batch_manual4 = [] index = i * args.batch_size + j-3 id = id_list[index] input_image0 = utils.load_image(train_input_names[id]) output_image0 = utils.load_image(train_output_names[id])

    index = i * args.batch_size + j-2 id = id_list[index] input_image1 = utils.load_image(train_input_names[id]) output_image1 = utils.load_image(train_output_names[id]) input_image0, output_image0 = data_augmentation(input_image0, output_image0) input_image1, output_image1 = data_augmentation(input_image1, output_image1) input_image0 = np.float32(input_image0) / 255.0 input_image1 = np.float32(input_image1) / 255.0 input_image_batch_manual4.append(np.expand_dims(input_image0, axis=0)) input_image_batch_manual4.append(np.expand_dims(input_image1, axis=0)) index = i * args.batch_size + j-1 id = id_list[index] input_image2 = utils.load_image(train_input_names[id]) output_image2 = utils.load_image(train_output_names[id])

    index = i * args.batch_size + j id = id_list[index] input_image3 = utils.load_image(train_input_names[id]) output_image3 = utils.load_image(train_output_names[id]) input_image2, output_image2 = data_augmentation(input_image2, output_image2) input_image3, output_image3 = data_augmentation(input_image3, output_image3) input_image2 = np.float32(input_image2) / 255.0 input_image3 = np.float32(input_image3) / 255.0 input_image_batch_manual4.append(np.expand_dims(input_image2, axis=0)) input_image_batch_manual4.append(np.expand_dims(input_image3, axis=0)) input_image_batch_manual4 = np.squeeze(np.stack(input_image_batch_manual4, axis=1)) output_images_batch4 = sess.run(network,feed_dict={net_input:input_image_batch_manual4})

    (input_image - input_image_batch[3]).max() #input image is the 4th image in the batch (input_image - input_image_batch_manual2[1]).max() #input image is the 2nd image in this manually loaded batch loaded in pdb (input_image - input_image_batch_manual4[3]).max() #input image is the 4th image in this manually loaded batch loaded in pdb

    (output_image_last_alone - output_images_orig4[3]).max() #the single batch run produces a different output (output_image_last_alone - output_images_batch2[1]).max() #the single batch run produces a different output (output_image_last_alone - output_images_batch4[3]).max() #the single batch run produces a different output (output_images_batch2[1] - output_images_batch4[3]).max() #batch size 2 produces different output than batch size 4

    (output_images_orig4 - output_images_batch4).max() #the manually loaded batch produces the same output as the original batch

    opened by RorryB 0
Owner
George Seif
Machine Learning Engineer | twitter.com/GeorgeSeif94
George Seif
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
An example of semantic segmentation using tensorflow in eager execution.

Semantic segmentation using Tensorflow eager execution Requirement Python 2.7+ Tensorflow-gpu OpenCv H5py Scikit-learn Numpy Imgaug Train with eager e

Iñigo Alonso Ruiz 25 Sep 29, 2022
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

null 97 Dec 17, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 4, 2023
Evaluation suite for large-scale language models.

This repo contains code for running the evaluations and reproducing the results from the Jurassic-1 Technical Paper (see blog post), with current support for running the tasks through both the AI21 Studio API and OpenAI's GPT3 API.

null 71 Dec 17, 2022
DeepOBS: A Deep Learning Optimizer Benchmark Suite

DeepOBS - A Deep Learning Optimizer Benchmark Suite DeepOBS is a benchmarking suite that drastically simplifies, automates and improves the evaluation

Aaron Bahde 7 May 12, 2020
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

null 22 Dec 12, 2022
Signals-backend - A suite of card games written in Python

Card game A suite of card games written in the Python language. Features coming

null 1 Feb 15, 2022
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search?? and Semantic Similarity tasks.

null 224 Jan 4, 2023
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 7, 2023
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

Marcel R. 349 Aug 6, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022