An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

dengdan

Last update: Dec 7, 2022

Related tags

Overview

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link

Contents:

Introduction

This is a re-implementation of the SegLink text detection algorithm described in the paper Detecting Oriented Text in Natural Images by Linking Segments, Baoguang Shi, Xiang Bai, Serge Belongie

Installation&requirements

tensorflow-gpu 1.1.0
cv2. I'm using 2.4.9.1, but some other versions less than 3 should be OK too. If not, try to switch to the version as mine.
download the project pylib and add the src folder to your PYTHONPATH

If any other requirements unmet, just install them following the error msg.

Datasets

Convert them into tfrecords format using the scripts in datasets if you wanna train your own model.

Problems

The convergence speed of my seglink is quite slow compared with that described in the paper. For example, the authors of SegLink paper said that a good result can be obtained by training on Synthtext for less than 10W iterations and on IC15-train for less than 1W iterations. However, using my implementation, I have to train on SynthText for about 20W iterations and another more than 10W iterations on IC15-train, to get a competitive result.

Several reasons may contribute to the slow convergency of my model:

Batch size. I don't have 4 12G-Titans for training, as described in the paper. Instead, I trained my model on two 8G GeForce GTX 1080 or two Titans.
Learning Rate. In the paper, 10^-3 and 10^-4 have been used. But I adopted a fixed learning rate of 10^-4.
Different initialization model. I used the pretrained VGG model from SSD-caffe on coco , because I thought it better than VGG trained on ImageNet. However, it seems that my point of view does not hold. 4.Some other differences exists maybe, I am not sure.

Models

Two models trained on SynthText and IC15 train can be downloaded.

seglink-384. Trained using image size of 384x384, the same image size as the paper. The Hmean is comparable to the result reported in the paper.

The hust_orientedText is the result of paper.

seglink-512. Trainied using image size of 512x512, and one pointer better than 384x384.

They have been trained:

on Synthtext for about 20W iterations, and on IC15-train for 10w~20W iterations.
learning_rate = 10e-4
two gpus
384: GTX 1080, batch_size = 24; 512: Titan, batch_size = 20

Both models perform best at seg_conf_threshold=0.8 and link_conf_threshold=0.5, well, another difference from paper, which takes 0.9 and 0.7 respectively.

Test Your own images

Use the script test_seglink.py, and a shortcut has been created in script test.sh:

Go to the seglink root directory and execute the command:


./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR

For example:


./scripts/test.sh 0 ~/models/seglink/model.ckpt-217867  ~/dataset/ICDAR2015/Challenge4/ch4_training_images

I have only tested my models on IC15-test, but any other images can be used for test: just put your images into a directory, and config the path in the command as DATASET_DIR.

A bunch of txt files and a zip file is created after test. If you are using IC15-test for testing, you can upload this zip file to the icdar evaluation server directly.

The text files and placed in a subdir of the checkpoint directory, and contain the bounding boxes as the detection results, and can visualized using the script visualize_detection_result.py.

The command looks like:


python visualize_detection_result.py \

    --image=where your images are put

    --det=the directory of the text files output by test_seglink.py

    --output=the output directory of detection results drawn on images.

For example:


python visualize_detection_result.py \

    --image=~/dataset/ICDAR2015/Challenge4/ch4_training_images/ \

    --det=~/models/seglink/seglink_icdar2015_without_ignored/eval/icdar2015_train/model.ckpt-72885/seg_link_conf_th_0.900000_0.700000/txt \
    --output=~/temp/no-use/seglink_result_512_train

Training and evaluation

The training processing requires data processing, i.e. converting data into tfrecords. The converting scripts are put in the datasets directory. The scrips:train_seglink.py and eval_seglink.py are the training and evaluation scripts respectively. Especially, I have implemented an offline evaluation function, which calculates the Recall/Precision/Hmean as the ICDAR test server, and can be used for cross validation and grid search. However, the resulting scores may have slight differences from those of test sever, but it does not matter that much. Sorry for the imcomplete documentation here. Read and modify them if you want to train your own model.

Some Comments

Thanks should be given to the authors of the Seglink paper, i.e., Baoguang Shi1 Xiang Bai1, Serge Belongie.

EAST is another paper on text detection accepted by CVPR 2017, and its reported result is better than that of SegLink. But if they both use same VGG16, their performances are quite similar.

Contact me if you have any problems, through github issues.

Some Notes On Implementation Detail

How the groundtruth is calculated, in Chinese: http://fromwiz.com/share/s/34GeEW1RFx7x2iIM0z1ZXVvc2yLl5t2fTkEg2ZVhJR2n50xg

Comments

How to train on my own datasets?
Hi, dengdan, thank you for your hard work, i am trying to train seglink model on my own datasets, i meet such situation:

I only get one GPU card which is TITAN XP, i have rewrite your train scripts,but i get some warnings

which pretrain model should i prepare for trainning progress? i got imagenet vgg16 checkpoints from SSD-tensorflow project, does this which part of the code should i rewrite to train on this pretrain model?

Thank you again, your work is awesome.
opened by BowieHsu 16
INFO:tensorflow:Error reported to Coordinator: , Retval[0] does not have value

INFO:tensorflow:global step 109662: loss = 5.3843 (0.160 sec/step) INFO:tensorflow:global step 109663: loss = 4.5832 (0.256 sec/step) INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value INFO:tensorflow:global step 109664: loss = 8.8361 (0.098 sec/step) INFO:tensorflow:Finished training! Saving model to disk. Traceback (most recent call last): File "./train_seglink.py", line 275, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./train_seglink.py", line 271, in main train(train_op) File "./train_seglink.py", line 260, in train session_config = sess_config File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 759, in train sv.saver.save(sess, sv.save_path, global_step=sv.global_step) File "/usr/lib/python2.7/contextlib.py", line 24, in exit self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session self.stop(close_summary_writer=close_summary_writer) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop stop_grace_period_secs=self._stop_grace_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception yield File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 494, in run self.run_loop() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop self._sv.global_step]) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value zst@zst-robot1:~/zst/seglink-master$ 我的tf 版本是1.2.1,我也尝试在1.1.0上运行也会出现这样的错误.

opened by zhangshuaitao 14
test algorithm on own images

I am proceeding exactly how you describe. However, once I run the code I receive

DataLossError (see above for traceback): Unable to open table file /home.net/vs17dow/Desktop/A_M-arbeit/G_Code/E_seglink/models/model.ckpt-136750.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2_24 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_24/tensor_names, save/RestoreV2_24/shape_and_slices)]] [[Node: save/RestoreV2_56/_73 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_260_save/RestoreV2_56", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

./scripts/test.sh 0 ~/Desktop/A_M-arbeit/G_Code/E_seglink/models/seglink/model.ckpt-136750.data-00000-of-00001 ~/Desktop/A_M-arbeit/G_Code/E_seglink/datasets/mydata/

I do not see an error in my command. Any advice how to proceed here?

Best

Valentin

opened by idefix92 10

About transofrom_cv_rect

When a rectangle is rotated to horizontal direction, should width-side be horizontal? should the length of width-side be larger than than the length of height-side?

The input parameter rects is generated by minarearect function of opncv


def transform_cv_rect(rects):
    only_one = False
    if len(np.shape(rects)) == 1:
        rects = np.expand_dims(rects, axis = 0)
        only_one = True
    assert np.shape(rects)[1] == 5, 'The shape of rects must be (N, 5), but meet %s'%(str(np.shape(rects)))
    rects = np.asarray(rects, dtype = np.float32).copy()
    num_rects = np.shape(rects)[0]
    for idx in xrange(num_rects):
        cx, cy, w, h, theta = rects[idx, ...];
        #assert theta < 0 and theta >= -90, "invalid theta: %f"%(theta) 
        if abs(theta) > 45 or (abs(theta) == 45 and w < h):
            w, h = [h, w]
            theta = 90 + theta
        rects[idx, ...] = [cx, cy, w, h, theta]
    if only_one:
        return rects[0, ...]
    return rects

After using the above function, it seems that it can't promise that the length of width is larger than the length of height.

I'm so confused.

opened by abc8350712 8

UnboundLocalError: local variable 'setproctitle' referenced before assignment

Hi, I have the dependencies installed for seglink already.

However, when I tried to run the program, I have encountered such problem:

Could anyone advise how can I solve this?

Thanks!

opened by JohnWnx 3
About bboxes_filter_overlap

I read the code and something make me confused. In the process of data augmentation,the following function appears.

bboxes_filter_overlap(labels, bboxes,xs, ys, threshold, scope=None, assign_negative = False)

Is the value in bboxes of parameters may be negative?

I am looking forward to your help！

opened by abc8350712 3
Trained model

Could you please share a Drive or Dropbox link with a quicker server? Unfortunately, I cannot download any of the models. Once I start downloading it is telling me 20 h....for 300 mB? And as a consequence Chrome stops the download.

BR

Valentin

opened by idefix92 3
exceptions.AttributeError: 'module' object has no attribute 'cv' When run test_seglink.py

I run test_seglink.py, and the script meet below error when it run to line "image_bboxes = sess.run([bboxes_pred], feed_dict = {image:image_data, image_shape:image_data.shape}) " I use tensorflow-gpu (1.2.0) python 2.7

Traceback (most recent call last):

File "", line 1, in runfile('/home/user/MZH/seglink-master/test_seglink.py', args='--dataset_dir=datasets/ICDAR-Test-Images --checkpoint_path=/home/user/MZH/seglink-master/seglink-384/model.ckpt-136750', wdir='/home/user/MZH/seglink-master')

File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile execfile(filename, namespace)

File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile builtins.execfile(filename, *where)

File "/home/user/MZH/seglink-master/test_seglink.py", line 164, in tf.app.run()

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "/home/user/MZH/seglink-master/test_seglink.py", line 160, in main eval()

File "/home/user/MZH/seglink-master/test_seglink.py", line 147, in eval image_bboxes = sess.run([bboxes_pred], feed_dict = {image:image_data, image_shape:image_data.shape})

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr)

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata)

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata)

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message)

UnknownError: exceptions.AttributeError: 'module' object has no attribute 'cv' [[Node: test/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](test/strided_slice_4, test/strided_slice_5, test/strided_slice_2, test/strided_slice_3, test/PyFunc/input_4, test/PyFunc/input_5)]]

Caused by op u'test/PyFunc', defined at: File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/ipython/start_kernel.py", line 231, in main() File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/ipython/start_kernel.py", line 227, in main kernel.start() File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start ioloop.IOLoop.instance().start() File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "/home/user/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/user/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events self._handle_recv() File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "/home/user/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell handler(stream, idents, msg) File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/user/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "/home/user/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2827, in run_ast_nodes if self.run_code(code, result): File "/home/user/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in runfile('/home/user/MZH/seglink-master/test_seglink.py', args='--dataset_dir=datasets/ICDAR-Test-Images --checkpoint_path=/home/user/MZH/seglink-master/seglink-384/model.ckpt-136750', wdir='/home/user/MZH/seglink-master') File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile execfile(filename, namespace) File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile builtins.execfile(filename, *where) File "/home/user/MZH/seglink-master/test_seglink.py", line 164, in tf.app.run() File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/home/user/MZH/seglink-master/test_seglink.py", line 160, in main eval() File "/home/user/MZH/seglink-master/test_seglink.py", line 98, in eval link_conf_threshold = config.link_conf_threshold) File "tf_extended/seglink.py", line 680, in tf_seglink_to_bbox tf.float32); File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func input=inp, token=token, Tout=Tout, name=name) File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func name=name) File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

UnknownError (see above for traceback): exceptions.AttributeError: 'module' object has no attribute 'cv' [[Node: test/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](test/strided_slice_4, test/strided_slice_5, test/strided_slice_2, test/strided_slice_3, test/PyFunc/input_4, test/PyFunc/input_5)]]

opened by 100200cs 3

When I add focal loss, the time of evaluation is very slow

Sorry to bother you. I change the loss to be focal loss,it trains normally. However,when i try to evaluate the effect of it, it's so slow and i can't understand why. This is the code:


    def focal_loss(self, onehot_labels, cls_preds,
                   alpha=0.25, gamma=2.0, name=None, scope=None):
        with tf.name_scope(scope, 'focal_loss', [cls_preds, onehot_labels]) as sc:
            logits = tf.convert_to_tensor(cls_preds)
            onehot_labels = tf.convert_to_tensor(onehot_labels)
        precise_logits = tf.cast(logits, tf.float32) if (
            logits.dtype == tf.float16) else logits
        onehot_labels = tf.cast(onehot_labels, precise_logits.dtype)
        predictions = tf.nn.sigmoid(precise_logits)
        predictions_pt = tf.where(tf.equal(onehot_labels, 1), predictions, 1. - predictions)
        # add small value to avoid 0
        epsilon = 1e-8
        alpha_t = tf.scalar_mul(alpha, tf.ones_like(onehot_labels, dtype=tf.float32))
        alpha_t = tf.where(tf.equal(onehot_labels, 1.0), alpha_t, 1 - alpha_t)
        losses = tf.reduce_mean(-alpha_t * tf.pow(1. - predictions_pt, gamma) * tf.log(predictions_pt + epsilon),
                               name=name)
        return losses

def build_loss(self, seg_labels, seg_offsets, link_labels, do_summary=True):
    batch_size = config.batch_size_per_gpu

    # note that for label values in both seg_labels and link_labels:
    #    -1 stands for negative
    #     1 stands for positive
    #     0 stands for ignored
    def get_pos_and_neg_masks(labels):
        if config.train_with_ignored:
            pos_mask = labels >= 0
            neg_mask = tf.logical_not(pos_mask)
        else:
            pos_mask = tf.equal(labels, 1)
            neg_mask = tf.equal(labels, -1)

        return pos_mask, neg_mask

    def OHNM_single_image(scores, n_pos, neg_mask):
        """Online Hard Negative Mining.
            scores: the scores of being predicted as negative cls
            n_pos: the number of positive samples
            neg_mask: mask of negative samples
            Return:
                the mask of selected negative samples.
                if n_pos == 0, no negative samples will be selected.
        """

        def has_pos():
            n_neg = n_pos * config.max_neg_pos_ratio
            max_neg_entries = tf.reduce_sum(tf.cast(neg_mask, tf.int32))
            n_neg = tf.minimum(n_neg, max_neg_entries)
            n_neg = tf.cast(n_neg, tf.int32)
            neg_conf = tf.boolean_mask(scores, neg_mask)
            vals, _ = tf.nn.top_k(-neg_conf, k=n_neg)
            threshold = vals[-1]  # a negtive value
            selected_neg_mask = tf.logical_and(neg_mask, scores <= -threshold)
            return tf.cast(selected_neg_mask, tf.float32)

        def no_pos():
            return tf.zeros_like(neg_mask, tf.float32)

        return tf.cond(n_pos > 0, has_pos, no_pos)

    def OHNM_batch(neg_conf, pos_mask, neg_mask):
        selected_neg_mask = []
        for image_idx in xrange(batch_size):
            image_neg_conf = neg_conf[image_idx, :]
            image_neg_mask = neg_mask[image_idx, :]
            image_pos_mask = pos_mask[image_idx, :]
            n_pos = tf.reduce_sum(tf.cast(image_pos_mask, tf.int32))
            selected_neg_mask.append(OHNM_single_image(image_neg_conf, n_pos, image_neg_mask))

        selected_neg_mask = tf.stack(selected_neg_mask)
        selected_mask = tf.cast(pos_mask, tf.float32) + selected_neg_mask
        return selected_mask

    # OHNM on segments
    seg_neg_scores = self.seg_scores[:, :, 0]
    seg_pos_mask, seg_neg_mask = get_pos_and_neg_masks(seg_labels)
    seg_selected_mask = OHNM_batch(seg_neg_scores, seg_pos_mask, seg_neg_mask)
    n_seg_pos = tf.reduce_sum(tf.cast(seg_pos_mask, tf.float32))

    with tf.name_scope('seg_cls_loss'):
        def has_pos():
            #seg_cls_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
            #    logits=self.seg_score_logits,
            #    labels=tf.cast(seg_pos_mask, dtype=tf.int32))
            #return tf.reduce_sum(seg_cls_loss * seg_selected_mask) / n_seg_pos
            seg_cls_loss = self.focal_loss(tf.one_hot(seg_labels, 2), self.seg_score_logits)
            return seg_cls_loss
        def no_pos():
            return tf.constant(.0);

        seg_cls_loss = tf.cond(n_seg_pos > 0, has_pos, no_pos)
        tf.add_to_collection(tf.GraphKeys.LOSSES, seg_cls_loss)

    def smooth_l1_loss(pred, target, weights):
        diff = pred - target
        abs_diff = tf.abs(diff)
        abs_diff_lt_1 = tf.less(abs_diff, 1)
        if len(target.shape) != len(weights.shape):
            loss = tf.reduce_sum(tf.where(abs_diff_lt_1, 0.5 * tf.square(abs_diff), abs_diff - 0.5), axis=2)
            return tf.reduce_sum(loss * tf.cast(weights, tf.float32))
        else:
            loss = tf.where(abs_diff_lt_1, 0.5 * tf.square(abs_diff), abs_diff - 0.5)
            return tf.reduce_sum(loss * tf.cast(weights, tf.float32))

    with tf.name_scope('seg_loc_loss'):
        def has_pos():
            seg_loc_loss = smooth_l1_loss(self.seg_offsets, seg_offsets,
                                          seg_pos_mask) * config.seg_loc_loss_weight / n_seg_pos
            names = ['loc_cx_loss', 'loc_cy_loss', 'loc_w_loss', 'loc_h_loss', 'loc_theta_loss']
            sub_loc_losses = []
            from tensorflow.python.ops import control_flow_ops
            for idx, name in enumerate(names):
                name_loss = smooth_l1_loss(self.seg_offsets[:, :, idx], seg_offsets[:, :, idx],
                                           seg_pos_mask) * config.seg_loc_loss_weight / n_seg_pos
                name_loss = tf.identity(name_loss, name=name)
                if do_summary:
                    tf.summary.scalar(name, name_loss)
                sub_loc_losses.append(name_loss)
            seg_loc_loss = control_flow_ops.with_dependencies(sub_loc_losses, seg_loc_loss)
            return seg_loc_loss

        def no_pos():
            return tf.constant(.0);

        seg_loc_loss = tf.cond(n_seg_pos > 0, has_pos, no_pos)
        tf.add_to_collection(tf.GraphKeys.LOSSES, seg_loc_loss)

    link_neg_scores = self.link_scores[:, :, 0]
    link_pos_mask, link_neg_mask = get_pos_and_neg_masks(link_labels)
    link_selected_mask = OHNM_batch(link_neg_scores, link_pos_mask, link_neg_mask)
    n_link_pos = tf.reduce_sum(tf.cast(link_pos_mask, dtype=tf.float32))
    with tf.name_scope('link_cls_loss'):
        def has_pos():
            #link_cls_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
            #    logits=self.link_score_logits,
            #    labels=tf.cast(link_pos_mask, tf.int32))
            #return tf.reduce_sum(link_cls_loss * link_selected_mask) / n_link_pos
            link_cls_loss = self.focal_loss(tf.one_hot(link_labels, 2), self.link_score_logits)
            return link_cls_loss
        def no_pos():
            return tf.constant(.0);

        link_cls_loss = tf.cond(n_link_pos > 0, has_pos, no_pos) * config.link_cls_loss_weight
        tf.add_to_collection(tf.GraphKeys.LOSSES, link_cls_loss)

    if do_summary:
        tf.summary.scalar('seg_cls_loss', seg_cls_loss)
        tf.summary.scalar('seg_loc_loss', seg_loc_loss)
        tf.summary.scalar('link_cls_loss', link_cls_loss)

Thanks in advance.

opened by abc8350712 2

The f-measure of evaluation
I used the command

python eval_seglink.py --checkpoint_path=./seglink/model.ckpt-136750 --dataset_name=icdar2015 --dataset_split_name=test --dataset_dir=./tf_records

to evaluate the model provided by you(seglink-384 model)

I change 'seg_conf_threshold' and 'link_conf_threshold' to 0.8 and 0.5 separately.

when i set the test image as 384x384, the result is Recall, Precision, Fmean = [0.48117587][0.72381693][0.57806695]

when I set the test image as 512x512, the result is Recall, Precision, Fmean = [0.61840743][0.78477693][0.69172925]

It doesn't match the result you provided. Is there anything i miss?
opened by abc8350712 2

Errors while changing the basenet

When I try to change the VGG net to Resnet,it doesn't work.

I mainly change the vgg.py file like

def basenet(inputs):
    logit, endpoints =resnet_50(inputs)
    endpoints['conv4_3'] = endpoints['vgg/resnet_50/block2/unit_2']
    endpoints['fc7'] = endpoints['vgg/resnet_50/block3/unit_4']
    return endpoints['fc7'], endpoints

#try to keep the output as original net

However it dosen't work but come out:

Traceback (most recent call last): File "/home/moon/seglink-master/train_seglink.py", line 276, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/home/moon/seglink-master/train_seglink.py", line 271, in main train_op = create_clones(batch_queue) File "/home/moon/seglink-master/train_seglink.py", line 220, in create_clones averaged_gradients = sum_gradients(gradients) File "/home/moon/seglink-master/train_seglink.py", line 164, in sum_gradients grad = tf.add_n(grads, name = v.op.name + '_summed_gradients') File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1918, in add_n raise ValueError("inputs must be a list of at least one Tensor with the " ValueError: inputs must be a list of at least one Tensor with the same dtype and shape

My coarse renset implemention is as follow:

import tensorflow as tf
import collections

slim = tf.contrib.slim

Block = collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])

def subsample(inputs, factor, scope=None):
    if factor == 1:
        return inputs
    else:
        return slim.max_pool2d(inputs, [1, 1], stride= factor, scope=scope)

def bottleneck(inputs,
               depth,
               depth_bottleneck,
               stride,
               outputs_collections='collections',
               scope=None):

    with tf.variable_scope(scope) as sc:
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
        preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')
        if depth==depth_in:
            shortcut = subsample(inputs, stride, 'shortcut')
        else:

            shortcut = slim.conv2d(preact, depth, [1, 1],
                                   stride=stride, normalizer_fn=None,
                                   activation_fn=None, scope='shortcut')
        residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1, scope='conv1')

        residual = slim.conv2d(residual, depth_bottleneck, [3, 3], stride=stride, padding='SAME', scope='conv2')

        residual = slim.conv2d(residual, depth, [1, 1], stride=1, scope='conv3')

        output = shortcut+residual

        return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)



def resnet_50(input):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    net = input
    net = slim.conv2d(net, 64, 7, stride=2, scope='conv1', padding='SAME')
    net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
    with tf.variable_scope('resnet_50'):
        for i, block in enumerate(blocks):
            with tf.variable_scope(block.scope):
                args = block.args
                for j, arg in enumerate(args):
                    depth, depth_bottleneck, stride = arg
                    net = bottleneck(net, depth, depth_bottleneck, stride, scope='unit_'+str(j))
    endpoints = slim.utils.convert_collection_to_dict('collections')
    return net, endpoints

Can you help me figure it out? Is there any example for changing basenet? Thank you! @dengdan

opened by abc8350712 2

Different output between .pb and .ckpt file

Hello! Thanks for this great work of text detection. I am recently working on a task of text detection and applying this model. the performance on my own data after training is wonderful. However, when I converted my training model of ckpt to pb model, the inference result of pb changes a lot. I cannot find out where the problem is, could you help me to locate my problem?

here is my conversion code from ckpt to pb model:


def freezeGraph(input_checkpoint, output_nodes_names, output_graph):
    '''
    :param input_checkpoint:
    :param output_graph: 
    :return:
    '''
    # checkpoint = tf.train.get_checkpoint_state(model_folder) 
    # input_checkpoint = checkpoint.model_checkpoint_path 
 
    # 
    output_node_names = output_nodes_names
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=True)
    graph = tf.get_default_graph() # 
    input_graph_def = graph.as_graph_def()  # 
 
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint) #
        output_graph_def = graph_util.convert_variables_to_constants(  # 
            sess=sess,
            input_graph_def=input_graph_def,# 
            output_node_names=output_node_names)# 
 
        with tf.gfile.GFile(output_graph, "wb") as f: 
            f.write(output_graph_def.SerializeToString()) 
        print("%d ops in the final graph." % len(output_graph_def.node)) 


if __name__ == '__main__':
    nodes = getOutNodes('./seglink.txt') 
    freezeGraph('./ckpt/model.ckpt-5882', nodes, './pb/seglink.pb')

the seglink.txt is recording the nodes' name of nodes seglink.txt

I wonder if its the problem of conversion

With many thanks if you can offer me some help of this problem!

opened by YIYANGCAI 0

The links for pretrained models are expired.

Hello, Thank you for the effort. The links you provided for the pre-trained models don't work. They are expired, maybe. Can you please provide another link from where I can download these pre-trained models. Thank you!

opened by famunir 0
loss is in [3,4],how can I do

thanks for your work~ I use your 384-pre-trained model to finetune on my own dataset of 70000 images but my loss is always around 3,can you tell me how can I do?

opened by unshaven 1
About the test<(=╥﹏╥=)>

Hey....can you help me with this BUG?~( ＞﹏＜) It spends me all day to DEBUG, but nothing changed...：

`++ set -e ++ export CUDA_VISIBLE_DEVICES=0 ++ CUDA_VISIBLE_DEVICES=0 ++ CHECKPOINT_PATH=/home/zhengdl/.local_config/seglink-master/models/model.ckpt-217867 ++ DATASET_DIR=/home/zhengdl/.local_config/seglink-master/test_imgs/ ++ python test_seglink.py --checkpoint_path=/home/zhengdl/.local_config/seglink-master/models/model.ckpt-217867 --gpu_memory_fraction=-1 --seg_conf_threshold=0.8 --link_conf_threshold=0.5 --dataset_dir=/home/zhengdl/.local_config/seglink-master/test_imgs/ Traceback (most recent call last): File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/logging/init.py", line 868, in emit msg = self.format(record) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/logging/init.py", line 741, in format return fmt.format(record) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/logging/init.py", line 465, in format record.message = record.getMessage() File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/logging/init.py", line 329, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Logged from file test_seglink.py, line 113 2019-05-10 05:14:02.616811: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2019-05-10 05:14:02.616853: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2019-05-10 05:14:02.616864: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2019-05-10 05:14:02.616872: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2019-05-10 05:14:02.616880: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2019-05-10 05:14:02.874247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:04:00.0 Total memory: 11.17GiB Free memory: 6.91GiB 2019-05-10 05:14:02.874348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 2019-05-10 05:14:02.874368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y 2019-05-10 05:14:02.874396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0) INFO:tensorflow:Restoring parameters from /home/zhengdl/.local_config/seglink-master/models/model.ckpt-217867 Traceback (most recent call last): File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in call ret = func(*args) File "/home/zhengdl/.local_config/seglink-master/tf_extended/seglink.py", line 712, in seglink_to_bbox bboxes = bboxes_to_xys(bboxes, image_shape) File "/home/zhengdl/.local_config/seglink-master/tf_extended/seglink.py", line 808, in bboxes_to_xys points = cv2.cv.BoxPoints(bbox) AttributeError: 'module' object has no attribute 'cv' 2019-05-10 05:14:12.480069: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log. Traceback (most recent call last): File "test_seglink.py", line 155, in tf.app.run() File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "test_seglink.py", line 151, in main resnet_v1_101/block4/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0 resnet_v1_101/block4/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0 resnet_v1_101/block4/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0 resnet_v1_101/block4/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv1/weights:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv2/weights:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0resnet_v1_101/block4/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/weights:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0 resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0 2019-05-09 02:34:26.768999: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2019-05-09 02:34:26.769044: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2019-05-09 02:34:26.769055: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2019-05-09 02:34:26.769063: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2019-05-09 02:34:26.769071: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. eval() File "test_seglink.py", line 138, in eval image_bboxes = sess.run([bboxes_pred], feed_dict = {image:image_data, image_shape:image_data.shape}) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log. [[Node: test/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](test/strided_slice_4/_189, test/strided_slice_5/_191, test/strided_slice_2/_193, test/strided_slice_3/_195, test/PyFunc/input_4, test/PyFunc/input_5)]]

Caused by op u'test/PyFunc', defined at: File "test_seglink.py", line 155, in tf.app.run() File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "test_seglink.py", line 151, in main eval() File "test_seglink.py", line 94, in eval link_conf_threshold = config.link_conf_threshold) File "/home/zhengdl/.local_config/seglink-master/tf_extended/seglink.py", line 680, in tf_seglink_to_bbox tf.float32); File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func input=inp, token=token, Tout=Tout, name=name) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func name=name) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zhengdl/anaconda3/envs/qq2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log. [[Node: test/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](test/strided_slice_4/_189, test/strided_slice_5/_191, test/strided_slice_2/_193, test/strided_slice_3/_195, test/PyFunc/input_4, test/PyFunc/input_5)]] `

opened by 1157942086 0

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Related tags

Overview

Introduction

Installation&requirements

Datasets

Problems

Models

Test Your own images

Training and evaluation

Some Comments

Some Notes On Implementation Detail

Comments

Owner

dengdan

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Recognizing cropped text in natural images.

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

TextBoxes++: A Single-Shot Oriented Scene Text Detector

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )