Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Overview

Detecting Text in Natural Image with Connectionist Text Proposal Network

The codes are used for implementing CTPN for scene text detection, described in:

Z. Tian, W. Huang, T. He, P. He and Y. Qiao: Detecting Text in Natural Image with
Connectionist Text Proposal Network, ECCV, 2016.

Online demo is available at: textdet.com

These demo codes (with our trained model) are for text-line detection (without side-refinement part).

Required hardware

You need a GPU. If you use CUDNN, about 1.5GB free memory is required. If you don't use CUDNN, you will need about 5GB free memory, and the testing time will slightly increase. Therefore, we strongly recommend to use CUDNN.

It's also possible to run the program on CPU only, but it's extremely slow due to the non-optimal CPU implementation.

Required softwares

Python2.7, cython and all what Caffe depends on.

How to run this code

  1. Clone this repository with git clone https://github.com/tianzhi0549/CTPN.git. It will checkout the codes of CTPN and Caffe we ship.

  2. Install the caffe we ship with codes bellow.

    • Install caffe's dependencies. You can follow this tutorial. Note: we need Python support. The CUDA version we need is 7.0.

    • Enter the directory caffe.

    • Run cp Makefile.config.example Makefile.config.

    • Open Makefile.config and set WITH_PYTHON_LAYER := 1. If you want to use CUDNN, please also set CUDNN := 1. Uncomment the CPU_ONLY :=1 if you want to compile it without GPU.

      Note: To use CUDNN, you need to download CUDNN from NVIDIA's official website, and install it in advance. The CUDNN version we use is 3.0.

    • Run make -j && make pycaffe.

  3. After Caffe is set up, you need to download a trained model (about 78M) from Google Drive or our website, and then populate it into directory models. The model's name should be ctpn_trained_model.caffemodel.

  4. Now, be sure you are in the root directory of the codes. Run make to compile some cython files.

  5. Run python tools/demo.py for a demo. Or python tools/demo.py --no-gpu to run it under CPU mode.

How to use other Caffe

If you may want to use other Caffe instead of the one we ship for some reasons, you need to migrate the following layers into the Caffe.

  • Reverse
  • Transpose
  • Lstm

License

The codes are released under the MIT License.

Comments
  • Implementation detail of training code

    Implementation detail of training code

    Hi, tianzhi, I tried to implement CTPN training code on the framework of py-faster-rcnn (by RBG), but the results were different from yours (of course worse) .

    1. Loss function. Did you revise the loss function (eg: SmoothL1Loss) of training code ?
    2. vertical proposals heights in a textline. A complete textline constitue of several vertical anchors in sequnce, and the heights of them vary slightly in your implementation, however the heights in my implementation vary enormously. Sometime, the proposal fit tightly to the boundry of single character, if the heights of characters in a textline differs greatly, heights of proposals differs too. So I want to ask the question : how did you make the heights and y coordinate of proposal sequence uniformed ? Via lstm? Or, other kind of change in python layer? If the answer is lstm , does that mean lstm not working in my implementation ? FTPN (CTPN with No RNN) seems have the same problem.
    opened by Xiangyu-CAS 38
  • Performance on tilt and perspective texts

    Performance on tilt and perspective texts

    Dear Tianzhi: I tried you demo, and obtained an exactly same reuslt on ICDAR 2013 Challenge 2 as you submited . It works perpectly ! BTW, OpenCV 3, CUDA 7.5 is compatible for this project. Now I am trying to test the performance on ICDAR 2015 Challenge 4, which is constitute of many tilt and perspective texts, but the boudingbox returned by your method is a rectangle of whole textline, instead of separated words represented by 8 coordinates. Did you submited the rectangle (4 coordinates) of whole textline in Challenge 4 as you did in Challenge 2 ? If not , what kind of adjustment is applied ? The publication did not mentioned any stuff about tilt and perpsective texts , so I got a little confused.

    Best Regards

    opened by Xiangyu-CAS 23
  • Changes to caffe comparing to the official one?

    Changes to caffe comparing to the official one?

    Is it possible to list the major changes to this version of caffe you use? I want know the potential issues/conflict while merging it with the newer version of caffe.

    My GPU has to run with CUDA 8.0, which is not compatible this version of caffe.

    opened by qingswu 16
  • Trainning can not converge when add lstm layer

    Trainning can not converge when add lstm layer

    I implement the RPN without LSTM from papers based on faster rcnn code. The result is very well for horizon text. But when add bi-directional lstm layer after last con layer, the model is not converge. And scores are the same for all image. Can anyone met this problem?

    `name: "VGG_ILSVRC_16_layers" layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 2" } } layer { name: "conv1_1" type: "Convolution" bottom: "data" top: "conv1_1" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 } } layer { name: "relu1_1" type: "ReLU" bottom: "conv1_1" top: "conv1_1" } layer { name: "conv1_2" type: "Convolution" bottom: "conv1_1" top: "conv1_2" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 } } layer { name: "relu1_2" type: "ReLU" bottom: "conv1_2" top: "conv1_2" } layer { name: "pool1" type: "Pooling" bottom: "conv1_2" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2_1" type: "Convolution" bottom: "pool1" top: "conv2_1" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 } } layer { name: "relu2_1" type: "ReLU" bottom: "conv2_1" top: "conv2_1" } layer { name: "conv2_2" type: "Convolution" bottom: "conv2_1" top: "conv2_2" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 } } layer { name: "relu2_2" type: "ReLU" bottom: "conv2_2" top: "conv2_2" } layer { name: "pool2" type: "Pooling" bottom: "conv2_2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3_1" type: "Convolution" bottom: "pool2" top: "conv3_1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 } } layer { name: "relu3_1" type: "ReLU" bottom: "conv3_1" top: "conv3_1" } layer { name: "conv3_2" type: "Convolution" bottom: "conv3_1" top: "conv3_2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 } } layer { name: "relu3_2" type: "ReLU" bottom: "conv3_2" top: "conv3_2" } layer { name: "conv3_3" type: "Convolution" bottom: "conv3_2" top: "conv3_3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 } } layer { name: "relu3_3" type: "ReLU" bottom: "conv3_3" top: "conv3_3" } layer { name: "pool3" type: "Pooling" bottom: "conv3_3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4_1" type: "Convolution" bottom: "pool3" top: "conv4_1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu4_1" type: "ReLU" bottom: "conv4_1" top: "conv4_1" } layer { name: "conv4_2" type: "Convolution" bottom: "conv4_1" top: "conv4_2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu4_2" type: "ReLU" bottom: "conv4_2" top: "conv4_2" } layer { name: "conv4_3" type: "Convolution" bottom: "conv4_2" top: "conv4_3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu4_3" type: "ReLU" bottom: "conv4_3" top: "conv4_3" } layer { name: "pool4" type: "Pooling" bottom: "conv4_3" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv5_1" type: "Convolution" bottom: "pool4" top: "conv5_1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu5_1" type: "ReLU" bottom: "conv5_1" top: "conv5_1" } layer { name: "conv5_2" type: "Convolution" bottom: "conv5_1" top: "conv5_2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu5_2" type: "ReLU" bottom: "conv5_2" top: "conv5_2" } layer { name: "conv5_3" type: "Convolution" bottom: "conv5_2" top: "conv5_3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu5_3" type: "ReLU" bottom: "conv5_3" top: "conv5_3" }

    #========= RPN ============ #========= RPN ============

    prepare lstm inputs

    layer { name: "im2col" bottom: "conv5_3" top: "im2col" type: "Im2col" convolution_param { pad: 1 kernel_size: 3 } } layer { name: "im2col_transpose" top: "im2col_transpose" bottom: "im2col" type: "Transpose" transpose_param { dim: 3 dim: 2 dim: 0 dim: 1 } } layer { name: "lstm_input" type: "Reshape" bottom: "im2col_transpose" top: "lstm_input" reshape_param { shape { dim: -1 } axis: 1 num_axes: 2 } }

    layer { name: "lstm" type: "Lstm" bottom: "lstm_input" top: "lstm" lstm_param { num_output: 128 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" } clipping_threshold: 1 } }

    ===================== rlstm ===================

    layer { name: "lstm-reverse1" type: "Reverse" bottom: "lstm_input" top: "rlstm_input" reverse_param { axis: 0 } } layer { name: "rlstm" type: "Lstm" bottom: "rlstm_input" top: "rlstm-output" lstm_param { num_output: 128 } } layer { name: "lstm-reverse2" type: "Reverse" bottom: "rlstm-output" top: "rlstm" reverse_param { axis: 0 } }

    merge lstm and rlstm

    layer { name: "merge_lstm_rlstm" type: "Concat" bottom: "lstm" bottom: "rlstm" top: "merge_lstm_rlstm" concat_param { axis: 2 } } layer { name: "lstm_output_reshape" type: "Reshape" bottom: "merge_lstm_rlstm" top: "lstm_output_reshape" reshape_param { shape { dim: -1 dim: 1 } axis: 1 num_axes: 1 } }

    transpose size of output as (N, C, H, W)

    layer { name: "lstm_output" type: "Transpose" bottom: "lstm_output_reshape" top: "lstm_output" transpose_param { dim: 2 dim: 3 dim: 1 dim: 0 } } layer { name: "fc" bottom: "lstm_output" top: "fc" type: "Convolution" convolution_param { num_output: 512 kernel_size: 1 } } layer { name: "relu_fc" type: "ReLU" bottom: "fc" top: "fc" }

    layer { name: "rpn_cls_score" type: "Convolution" bottom: "fc" top: "rpn_cls_score" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 20 # 2(bg/fg) * 9(anchors) kernel_size: 1 pad: 0 stride: 1 } }

    layer { name: "rpn_bbox_pred" type: "Convolution" bottom: "fc" top: "rpn_bbox_pred" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 20 # 4 * 9(anchors) kernel_size: 1 pad: 0 stride: 1 } }

    layer { bottom: "rpn_cls_score" top: "rpn_cls_score_reshape" name: "rpn_cls_score_reshape" type: "Reshape" reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } } }

    layer { name: 'rpn-data' type: 'Python' bottom: 'rpn_cls_score' bottom: 'gt_boxes' bottom: 'im_info' bottom: 'data' top: 'rpn_labels' top: 'rpn_bbox_targets' top: 'rpn_bbox_inside_weights' top: 'rpn_bbox_outside_weights' python_param { module: 'rpn.anchor_target_layer' layer: 'AnchorTargetLayer' param_str: "'feat_stride': 16" } }

    layer { name: "rpn_loss_cls" type: "SoftmaxWithLoss" bottom: "rpn_cls_score_reshape" bottom: "rpn_labels" propagate_down: 1 propagate_down: 0 top: "rpn_cls_loss" loss_weight: 1 loss_param { ignore_label: -1 normalize: true } }

    layer { name: "rpn_loss_bbox" type: "SmoothL1Loss" bottom: "rpn_bbox_pred" bottom: "rpn_bbox_targets" bottom: 'rpn_bbox_inside_weights' bottom: 'rpn_bbox_outside_weights' top: "rpn_loss_bbox" loss_weight: 1 smooth_l1_loss_param { sigma: 3.0 } } `

    opened by dajiangxiaoyan 13
  • Implementation of Side-Refinement

    Implementation of Side-Refinement

    Hi, TianZhi, I am trying to implement the training phase of CTPN with TensorFlow based on your code and faster-rcnn. However, I don't quietly understand the meaning of side-refinement or offset regression in your paper. The problems is I don't understand the meaning of x_{side} in your paper so I can't implement the training loss function. Can you release the code of side-refinement or tell me how to calculate x_{side}? It will help me understanding the meaning of your excellent idea. Many thanks!

    opened by senliuy 10
  • "caffe.LayerParameter" has no field named "transpose_param"

    I am trying to run the demo.py file and getting the following error. Is there something wrong that I am doing. I have the BVLC version of caffe also installed in a different location.

    This is the error I am getting while trying to run demo.py

    WARNING: Logging before InitGoogleLogging() is written to STDERR W0104 15:27:11.308679 30453 _caffe.cpp:154] DEPRECATION WARNING - deprecated use of Python interface W0104 15:27:11.308717 30453 _caffe.cpp:155] Use this instead (with the named "weights" parameter): W0104 15:27:11.308722 30453 _caffe.cpp:157] Net('models/deploy.prototxt', 1, weights='models/ctpn_trained_model.caffemodel') [libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 387:19: Message type "caffe.LayerParameter" has no field named "transpose_param". F0104 15:27:11.310125 30453 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/deploy.prototxt *** Check failure stack trace: *** Aborted (core dumped)

    opened by mahaling 9
  • HOW to run this code

    HOW to run this code

    at first thank you tianzhi0549 for your great work I try to run this code but i cant ,and it seems that there are a lot of people who could run it so i need help please if any one can write the steps in clearer way or make video with the steps i will be very grateful for him and for example i am very confusing should i install caffe or the caffe in the CTPN folder is enough and another a lot of questions so if any one could help me with the steps i will be really grateful

    opened by eslambakr 8
  • Error when compiling using cmake: contrastive_loss_layer.cpp: error: no matching function for call to 'max(float, double)'

    Error when compiling using cmake: contrastive_loss_layer.cpp: error: no matching function for call to 'max(float, double)'

    I am trying to compile your version of Caffe and miserably have faced this issue that I don't seem to be able to solve it. I appreciate hints from anyone. Here's what I do:

    1. Installed all dependencies
    2. Changed Make.config file to use only cpu with python layer = 1
    3. cmake --cmake -DCPU_ONLY=ON .. (And bellow is cmake configuration summary):

    -- ******************* Caffe Configuration Summary ******************* -- General: -- Version : (Caffe doesn't declare its version in headers) -- Git : 1595004-dirty -- System : Linux -- C++ compiler : /usr/bin/c++ -- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized -- Debug CXX flags : -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized -- Build type : Release

    -- BUILD_SHARED_LIBS : ON -- BUILD_python : ON -- BUILD_matlab : OFF -- BUILD_docs : ON -- CPU_ONLY : ON

    -- Dependencies: -- BLAS : Yes (Atlas) -- Boost : Yes (ver. 1.62) -- glog : Yes -- gflags : Yes -- protobuf : Yes (ver. 3.0.0) -- lmdb : Yes (ver. 0.9.18) -- Snappy : Yes (ver. 1.1.3) -- LevelDB : Yes (ver. 1.18) -- OpenCV : Yes (ver. 2.4.9.1) -- CUDA : No

    -- Python: -- Interpreter : /usr/bin/python2.7 (ver. 2.7.13) -- Libraries : /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.13) -- NumPy : /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.12.1)

    -- Documentaion: -- Doxygen : No -- config_file :

    -- Install: -- Install path : /data/CTPN/caffe/build/install

    -- Configuring done -- Generating done -- Build files have been written to: /data/CTPN/caffe/build

    1. make all -j4 (results in the following error):

    [ 14%] Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/contrastive_loss_layer.cpp.o [ 14%] Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/conv_layer.cpp.o /home/CTPN/caffe/src/caffe/layers/contrastive_loss_layer.cpp: In instantiation of 'void caffe::ContrastiveLossLayer::Forward_cpu(const std::vector<caffe::Blob>&, const std::vector<caffe::Blob>&) [with Dtype = float]': /home/CTPN/caffe/src/caffe/layers/contrastive_loss_layer.cpp:118:1: required from here /home/CTPN/caffe/src/caffe/layers/contrastive_loss_layer.cpp:56:30: error: no matching function for call to 'max(float, double)' Dtype dist = std::max(margin - sqrt(dist_sq_.cpu_data()[i]), 0.0); ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    I have managed and used original version of caffe. Although I faced all types of error during its compilation, except this one.

    Many thanks in advance.

    opened by aliko70 7
  • Is there a limit for number of letters to be present in the word in order to get it detected?

    Is there a limit for number of letters to be present in the word in order to get it detected?

    First of all thanks for sharing this project. I have tried with the project on the images of documents, it detect every text except for those word having letter 2 or less than 2.

    Is there any option in the cfg.py which i could change in order to detect those too?

    opened by vijay1131 7
  • about vertical text !

    about vertical text !

    Hi there, recently I‘m going to do some tests on text detection,in natural image I googled a lot and found this great work ! I found that most detections are failure on vertical text after tried some images . how to train this network and is your dataset available for downloading and training ? thanks ! Best regards,

    opened by goodtogood 6
  • from utils.cpu_nms import cpu_nms as nms ImportError: ./src/utils/cpu_nms.so: undefined symbol: PyFPE_jbuf

    from utils.cpu_nms import cpu_nms as nms ImportError: ./src/utils/cpu_nms.so: undefined symbol: PyFPE_jbuf

    charan/CTPN$ python tools/demo.py Traceback (most recent call last): File "tools/demo.py", line 24, in from detectors import TextProposalDetector, TextDetector File "./src/detectors.py", line 4, in from utils.cpu_nms import cpu_nms as nms ImportError: ./src/utils/cpu_nms.so: undefined symbol: PyFPE_jbuf

    Running in a virtual anaconda environment and i am using same python

    opened by charan223 5
  • error:Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Crop, Data,

    error:Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Crop, Data,

    I could successfully build, but I am not able to run the demo.When I execute the demo(no-gpu), the error is as follows: Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, Lstm, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, ROIPooling, ReLU, Reduction, Reshape, Reverse, SPP, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, Transpose, WindowData) *** Check failure stack trace: *** Aborted (core dumped) 微信截图_20200312104516

    opened by striveallen 2
  • Can you share your dataset that mentioned in the article?

    Can you share your dataset that mentioned in the article?

    "Our model was trained on 3,000 natural images, including 229 images from the ICDAR 2013 training set. We collected the other images ourselves and manually labelled them with text line bounding boxes".

    opened by hcnhatnam 1
  • How to generate training labels?

    How to generate training labels?

    Hi, I read your paper and have some questions about training labels and side-refinement. About training labels, the paper says:

    For text/non-text classification, a binary label is assigned to each positive(text) or negative(non-text) anchor. It is defined by computing the IoU overlap with the GT bounding box (divided by anchor location).

    My first question is how to divide the GT bounding box? I think I need to divide the GT bounding box into many fine-scale bounding box (just like detect text) and then I can label anchors. According your paper I think I should divide the GT bounding box by anchor location. For a particular image and vgg-16 net, all anchors location is fixed because of the net architecture, so I need to divide the GT bounding box by these anchors?

    Next question is how to compute the o* in side-refinement. If I divide the GT bounding box by anchor location, for the equation 4 in your paper, I think the xside is the right/left side of the GT bounding box, and cxa is the anchor that divide the GT bounding box. And for each GT bounding box I only need to compute the o for left/right side anchor. Is my understanding correct? Thank you in advance. @tianzhi0549

    opened by zwenwang 0
Owner
Tian Zhi
PhD Candidate.
Tian Zhi
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 7, 2022
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 4, 2022
governance proposal to make fei redeemable for eth

Feil Proposal ?? Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

null 13 Mar 31, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 758 Dec 22, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 1, 2021
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

null 1 Nov 8, 2021
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 2, 2023
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022