Tensorflow Implementation of Pixel Transposed Convolutional Networks (PixelTCN and PixelTCL)

Hongyang Gao

Last update: Jul 24, 2022

Related tags

Overview

Pixel Transposed Convolutional Networks

Created by Hongyang Gao, Hao Yuan, Zhengyang Wang and Shuiwang Ji at Texas A&M University.

Introduction

Pixel transposed convolutional layer (PixelTCL) is a more effective way to perform up-sampling operations than transposed convolutional layer.

Detailed information about PixelTCL is provided in [arXiv tech report] (https://arxiv.org/abs/1705.06820).

Citation

If using this code, please cite our paper.

@article{gao2017pixel,
  title={Pixel Transposed Convolutional Networks},
  author={Hongyang Gao and Hao Yuan and Zhengyang Wang and Shuiwang Ji},
  journal={arXiv preprint arXiv:1705.06820},
  year={2017}
}

Results

Semantic segmentation

Comparison of semantic segmentation results. The first and second rows are images and ground true labels, respectively. The third and fourth rows are the results of using regular transposed convolution and our proposed pixel transposed convolution, respectively.

Generate real images (VAE)

Sample face images generated by VAEs when trained on the CelebA dataset. The first two rows are images generated by a standard VAE with transposed convolutional layers for up-sampling. The last two rows are images generated by the same VAE model, but using PixelTCL for up-sampling in the generator network.

System requirement

Programming language

Python 3.5+

Python Packages

tensorflow (CPU) or tensorflow-gpu (GPU), numpy, h5py, progressbar, PIL, scipy

Prepare data

In this project, we provided a set of sample datasets for training, validation, and testing. If want to train on other data such as PASCAL, prepare the h5 files as required. utils/h5_utils.py could be used to generate h5 files.

Configure the network

All network hyperparameters are configured in main.py.

Training

max_step: how many iterations or steps to train

test_step: how many steps to perform a mini test or validation

save_step: how many steps to save the model

summary_step: how many steps to save the summary

Data

data_dir: data directory

train_data: h5 file for training

valid_data: h5 file for validation

test_data: h5 file for testing

batch: batch size

channel: input image channel number

height, width: height and width of input image

Debug

logdir: where to store log

modeldir: where to store saved models

sampledir: where to store predicted samples, please add a / at the end for convinience

model_name: the name prefix of saved models

reload_step: where to return training

test_step: which step to test or predict

random_seed: random seed for tensorflow

Network architecture

network_depth: how deep of the U-Net including the bottom layer

class_num: how many classes. Usually number of classes plus one for background

start_channel_num: the number of channel for the first conv layer

conv_name: use which convolutional layer in decoder. We have conv2d for standard convolutional layer, and ipixel_cl for input pixel convolutional layer proposed in our paper.

deconv_name: use which upsampling layer in decoder. We have deconv for standard transposed convolutional layer, ipixel_dcl for input pixel transposed convolutional layer, and pixel_dcl for pixel transposed convolutional layer proposed in our paper.

Training and Testing

Start training

After configure the network, we can start to train. Run

python main.py

The training of a U-Net for semantic segmentation will start.

Training process visualization

We employ tensorboard to visualize the training process.

tensorboard --logdir=logdir/

The segmentation results including training and validation accuracies, and the prediction outputs are all available in tensorboard.

Testing and prediction

Select a good point to test your model based on validation or other measures.

Fill the test_step in main.py with the checkpoint you want to test, run

python main.py --action=test

The final output include accuracy and mean_iou.

If you want to make some predictions, run

python main.py --action=predict

The predicted segmentation results will be in sampledir set in main.py, colored.

Use PixelDCL in other models

If you want to use pixel transposed convolutional layer in other models, just copy the file

utils/pixel_dcn.py

and use it in your model:


from pixel_dcn import pixel_dcl, ipixel_dcl, ipixel_cl


outputs = pixel_dcl(inputs, out_num, kernel_size, scope)

Currently, this version only support up-sampling by factor 2 such as from 2x2 to 4x4. We may provide more flexible version in the future.

Comments

Is there randomness when predicting the Segmentation Mask

Hi @HongyangGao , Help me please!

So I trained Fire Segmentation Network (Below I will paste the conf), the loss is decreased from 0.72 to 0.005 which is good enough to expect something cool from --option=predict.

However, every time I run the main.py --option=predict I get different predictions even though test step(the weights) are the same. Here is the results from the two runs: Run 1: 2_cv img__2 fire117_gt

Run 2: img__2 2_cv fire117_gt

If you see them Run 1 is doing well in regards to seperating fire, but when I run second time I am getting results of Run 2 which is not what I want.

My question is why I am getting two different results for the same input. I also saw there is a random_seed is used in configure() method, is it because of this variable.

This is my Confs for both of Run1 and Run2:

# training
  flags = tf.app.flags
  flags.DEFINE_integer('max_step', 60000, '# of step for training')
  flags.DEFINE_integer('test_interval', 100, '# of interval to test a model')
  flags.DEFINE_integer('save_interval', 100, '# of interval to save  model')
  flags.DEFINE_integer('summary_interval', 100, '# of step to save summary')
  flags.DEFINE_float('learning_rate', 1e-4, 'learning rate')
  # data
  flags.DEFINE_string('data_dir', r'E:\dataset\patchBasedFireDataset\BowDataset/', 'Name of data directory')
  flags.DEFINE_string('train_data', 'BowDatasettraining.h5', 'Training data')
  flags.DEFINE_string('valid_data', 'BowDatasetvalidation.h5', 'Validation data')
  flags.DEFINE_string('test_data', 'BowDatasettesting.h5', 'Testing data')
  flags.DEFINE_string('data_type', '2D', '2D data or 3D data')
  flags.DEFINE_integer('batch', 1, 'batch size')
  flags.DEFINE_integer('channel', 3, 'channel size')
  flags.DEFINE_integer('depth', 1, 'depth size')
  flags.DEFINE_integer('height', 256, 'height size')
  flags.DEFINE_integer('width', 256, 'width size')
  # Debug
  flags.DEFINE_string('logdir', './logdir', 'Log dir')
  flags.DEFINE_string('modeldir', './modeldir', 'Model dir')
  flags.DEFINE_string('sampledir', './samples/', 'Sample directory')
  flags.DEFINE_string('model_name', 'model', 'Model file name')
  flags.DEFINE_integer('reload_step', 95000, 'Reload step to continue training')
  flags.DEFINE_integer('test_step', 300000, 'Test or predict model at this step')
  flags.DEFINE_integer('random_seed', int(time.time()), 'random seed')
  # network architecture
  flags.DEFINE_integer('network_depth', 5, 'network depth for U-Net')
  flags.DEFINE_integer('class_num', 2, 'output class number')
  flags.DEFINE_integer('start_channel_num', 16,
                       'start number of outputs for the first conv layer')

Thank you for your help, I do appreciate it!

opened by Jumabek 8

Differences between PixelTCN and PixelShuffle?

PixelShuffle refers to Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Shi et. al (2016). Thanks in advance.

opened by HolmesShuan 4
How does the loss should look like?

For Fire Segmentation problem which has only 2 classes I am getting following loss: batch_size=1 Is that normal. The reason for my suspicion is that I get very different results for neighboring weights for example. Weight of Iter 1540 is very different from 1500

For

Outputs. iter 1540:

iter 1500:

opened by Jumabek 4

2D training example is not working?

So I am trying to do 2D training, using the provided example. But I am afraid I am doing it wrong, could you please check my Configurations

This is how my Configurations looks like:

   # training
    flags = tf.app.flags
    flags.DEFINE_integer('max_step', 60, '# of step for training')
    flags.DEFINE_integer('test_interval', 100, '# of interval to test a model')
    flags.DEFINE_integer('save_interval', 2, '# of interval to save  model')
    flags.DEFINE_integer('summary_interval', 100, '# of step to save summary')
    flags.DEFINE_float('learning_rate', 1e-3, 'learning rate')
    # data
    flags.DEFINE_string('data_dir', './dataset/', 'Name of data directory')
    flags.DEFINE_string('train_data', 'training.h5', 'Training data')
    flags.DEFINE_string('valid_data', 'validation.h5', 'Validation data')
    flags.DEFINE_string('test_data', 'testing.h5', 'Testing data')
    flags.DEFINE_string('data_type', '2D', '2D data or 3D data')
    flags.DEFINE_integer('batch', 2, 'batch size')
    flags.DEFINE_integer('channel', 3, 'channel size')
    flags.DEFINE_integer('depth', 1, 'depth size')
    flags.DEFINE_integer('height', 256, 'height size')
    flags.DEFINE_integer('width', 256, 'width size')
    # Debug
    flags.DEFINE_string('logdir', './logdir', 'Log dir')
    flags.DEFINE_string('modeldir', './modeldir', 'Model dir')
    flags.DEFINE_string('sampledir', './samples/', 'Sample directory')
    flags.DEFINE_string('model_name', 'model', 'Model file name')
    flags.DEFINE_integer('reload_step', 0, 'Reload step to continue training')
    flags.DEFINE_integer('test_step', 60, 'Test or predict model at this step')
    flags.DEFINE_integer('random_seed', int(time.time()), 'random seed')
    # network architecture
    flags.DEFINE_integer('network_depth', 5, 'network depth for U-Net')
    flags.DEFINE_integer('class_num', 2, 'output class number')
    flags.DEFINE_integer('start_channel_num', 3,
                         'start number of outputs for the first conv layer')

This is the training log:

----training loss 0.439823
----training loss 0.685655
---->saving 2
----training loss 0.716789
----training loss 0.712096
---->saving 4
----training loss 0.683432
----training loss 0.523199
---->saving 6
----training loss 0.49661
----training loss 0.695421
---->saving 8
----training loss 0.356408
----training loss 0.653522
---->saving 10
----training loss 0.483054
----training loss 0.420301
---->saving 12
----training loss 0.442111
----training loss 0.685215
---->saving 14
----training loss 0.356957
----training loss 0.671164
---->saving 16
----training loss 0.594261
----training loss 0.472929
---->saving 18
----training loss 0.531491
----training loss 0.481012
---->saving 20
----training loss 0.348508
----training loss 0.599136
---->saving 22
----training loss 0.505479
----training loss 0.665116
---->saving 24
----training loss 0.703314
----training loss 0.580268
---->saving 26
----training loss 0.465344
----training loss 0.482133
---->saving 28
----training loss 0.333282
----training loss 0.454517
---->saving 30
----training loss 0.475901
----training loss 0.517551
---->saving 32
----training loss 0.410749
----training loss 0.619331
---->saving 34
----training loss 0.488531
----training loss 0.296446
---->saving 36
----training loss 0.642942
----training loss 0.59157
---->saving 38
----training loss 0.449556
----training loss 0.448444
---->saving 40
----training loss 0.421643
----training loss 0.516128
---->saving 42
----training loss 0.445137
----training loss 0.632879
---->saving 44
----training loss 0.631831
----training loss 0.427582
---->saving 46
----training loss 0.468836
----training loss 0.417944
---->saving 48
----training loss 0.505529
----training loss 0.662951
---->saving 50
----training loss 0.604587
----training loss 0.330602
---->saving 52
----training loss 0.444277
----training loss 0.393231
---->saving 54
----training loss 0.291731
----training loss 0.460036
---->saving 56
----training loss 0.550641
----training loss 0.62895
---->saving 58
----training loss 0.47633
----training loss 0.411844
---->saving 60

opened by Jumabek 3

Questions about kernel_size

Thanks for your contribution. I want to use the pixel_dcl(inputs, out_num, kernel_size, scope, activation_fn=tf.nn.relu, d_format='NHWC') in pixel_dcn.py as the deconvolution layer,can you explain how should I decide the kernel_size?

opened by daixiaogang 3
How to obtain 3D example data

Hi @HongyangGao , Could you please share the code for obtaining 3D data from 10 VOC example images,

I want to make fire segmentation app using your source code, cuz other code do not support windows platform.

But since I am new to segmentation, I wondering should I use 2D or 3D data and network.

If not code, any reference articles would be fine too.

Thank you, Jumabek

opened by Jumabek 2

Error when saving predictions of PixelDCN

Hi thx for the amazing work!

I am trying prediction example on the source code and as usual I have an error which I cannot solve (at least not yet). Could you please help me Debug this.

The error is occuring in the following line: https://github.com/HongyangGao/PixelDCN/blob/master/network.py#L280

Thank you.

And this is the call Stack:

---->predicting
inputs.shape:  (1, 16, 256, 256, 1)
----->saving predictions
Traceback (most recent call last):
  File "main.py", line 28, in <module>
    tf.app.run()
  File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 22, in main
    getattr(model, args.option)()
  File "E:\code\FireSegmentation\PixelDCN\network.py", line 287, in predict
    str(index*prediction.shape[0]+i)+'.png')
  File "E:\code\FireSegmentation\PixelDCN\utils\img_utils.py", line 44, in imsave
    if k < 21:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

opened by Jumabek 2

parameter question

Hello, I want to replace Transposed Convolution with Pixel Transposed Convolution, which you share. I used pixel_dcl of your pixel_dcn directly, but there is a parameter scope that I don't know what to fill in. What does this parameter relate to and how to fill in?

opened by watchmexiang 1
keep NHWC format in file

The model reads the h5 file this way. As mentioned in my last pull request #42 ^^ Since the model reads the h5 file in NHWC as tensorflow does, the wh-tuple should be inverted for the h5dataset creation. Now it works as expected, erasing my previous fix for non-squared pictures.

opened by FiLeonard 1
Update h5_util.py

columns and rows are switched by the image to numpy conversion. I had some errors. Since Tensorflow reads data NHWC: if the hf5 file should be kept in the NHWC format. instead of my changes the tuple should be flipped in the hf5 datset creation. if the tuple should be read as (height, width) it has to be flipped in the resize operation...

opened by FiLeonard 0
Cannot infer num from shape

the code in there：

with tf.variable_scope("upsampling_logits"): ###############################0430##################### encoder_output_dcl = pixel_dcl(encoder_output, 256, [2,2], scope='pixel_dcl_1',activation_fn=tf.nn.relu,d_format='NHWC') # net = pixel_dcl(encoder_output_dcl, 256, [2,2], scope='pixel_dcl_2') ########################################################## net = tf.image.resize_bilinear(encoder_output_dcl, low_level_features_size, name='upsample_1')

traning is normal,but eval is error

opened by serendipity999 0

Tensorflow Implementation of Pixel Transposed Convolutional Networks (PixelTCN and PixelTCL)

Related tags

Overview

Pixel Transposed Convolutional Networks

Introduction

Citation

Results

Semantic segmentation

Generate real images (VAE)

System requirement

Programming language

Python Packages

Prepare data

Configure the network

Training

Data

Debug

Network architecture

Training and Testing

Start training

Training process visualization

Testing and prediction

Use PixelDCL in other models

Comments

Owner

Hongyang Gao

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

The implementation of ICASSP 2020 paper "Pixel-level self-paced learning for super-resolution"

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)