Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Tian Zhi

Last update: Dec 22, 2022

Related tags

Overview

Detecting Text in Natural Image with Connectionist Text Proposal Network

The codes are used for implementing CTPN for scene text detection, described in:

Z. Tian, W. Huang, T. He, P. He and Y. Qiao: Detecting Text in Natural Image with
Connectionist Text Proposal Network, ECCV, 2016.

Online demo is available at: textdet.com

These demo codes (with our trained model) are for text-line detection (without side-refinement part).

Required hardware

You need a GPU. If you use CUDNN, about 1.5GB free memory is required. If you don't use CUDNN, you will need about 5GB free memory, and the testing time will slightly increase. Therefore, we strongly recommend to use CUDNN.

It's also possible to run the program on CPU only, but it's extremely slow due to the non-optimal CPU implementation.

Required softwares

Python2.7, cython and all what Caffe depends on.

How to run this code

Clone this repository with git clone https://github.com/tianzhi0549/CTPN.git. It will checkout the codes of CTPN and Caffe we ship.
Install the caffe we ship with codes bellow.
- Install caffe's dependencies. You can follow this tutorial. Note: we need Python support. The CUDA version we need is 7.0.
- Enter the directory caffe.
- Run cp Makefile.config.example Makefile.config.
- Open Makefile.config and set WITH_PYTHON_LAYER := 1. If you want to use CUDNN, please also set CUDNN := 1. Uncomment the CPU_ONLY :=1 if you want to compile it without GPU.
  
  Note: To use CUDNN, you need to download CUDNN from NVIDIA's official website, and install it in advance. The CUDNN version we use is 3.0.
- Run make -j && make pycaffe.
After Caffe is set up, you need to download a trained model (about 78M) from Google Drive or our website, and then populate it into directory models. The model's name should be ctpn_trained_model.caffemodel.
Now, be sure you are in the root directory of the codes. Run make to compile some cython files.
Run python tools/demo.py for a demo. Or python tools/demo.py --no-gpu to run it under CPU mode.

How to use other Caffe

If you may want to use other Caffe instead of the one we ship for some reasons, you need to migrate the following layers into the Caffe.

Reverse
Transpose
Lstm

License

The codes are released under the MIT License.

Comments

Implementation detail of training code
Hi, tianzhi, I tried to implement CTPN training code on the framework of py-faster-rcnn (by RBG), but the results were different from yours (of course worse) .

Loss function. Did you revise the loss function (eg: SmoothL1Loss) of training code ?

vertical proposals heights in a textline. A complete textline constitue of several vertical anchors in sequnce, and the heights of them vary slightly in your implementation, however the heights in my implementation vary enormously. Sometime, the proposal fit tightly to the boundry of single character, if the heights of characters in a textline differs greatly, heights of proposals differs too. So I want to ask the question : how did you make the heights and y coordinate of proposal sequence uniformed ? Via lstm? Or, other kind of change in python layer? If the answer is lstm , does that mean lstm not working in my implementation ? FTPN (CTPN with No RNN) seems have the same problem.
opened by Xiangyu-CAS 38
Performance on tilt and perspective texts

Dear Tianzhi: I tried you demo, and obtained an exactly same reuslt on ICDAR 2013 Challenge 2 as you submited . It works perpectly ! BTW, OpenCV 3, CUDA 7.5 is compatible for this project. Now I am trying to test the performance on ICDAR 2015 Challenge 4, which is constitute of many tilt and perspective texts, but the boudingbox returned by your method is a rectangle of whole textline, instead of separated words represented by 8 coordinates. Did you submited the rectangle (4 coordinates) of whole textline in Challenge 4 as you did in Challenge 2 ? If not , what kind of adjustment is applied ? The publication did not mentioned any stuff about tilt and perpsective texts , so I got a little confused.

Best Regards

opened by Xiangyu-CAS 23
Changes to caffe comparing to the official one?

Is it possible to list the major changes to this version of caffe you use? I want know the potential issues/conflict while merging it with the newer version of caffe.

My GPU has to run with CUDA 8.0, which is not compatible this version of caffe.

opened by qingswu 16
Trainning can not converge when add lstm layer

I implement the RPN without LSTM from papers based on faster rcnn code. The result is very well for horizon text. But when add bi-directional lstm layer after last con layer, the model is not converge. And scores are the same for all image. Can anyone met this problem?

`name: "VGG_ILSVRC_16_layers" layer { name: 'input-data' type: 'Python' top: 'data' top: 'im_info' top: 'gt_boxes' python_param { module: 'roi_data_layer.layer' layer: 'RoIDataLayer' param_str: "'num_classes': 2" } } layer { name: "conv1_1" type: "Convolution" bottom: "data" top: "conv1_1" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 } } layer { name: "relu1_1" type: "ReLU" bottom: "conv1_1" top: "conv1_1" } layer { name: "conv1_2" type: "Convolution" bottom: "conv1_1" top: "conv1_2" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 } } layer { name: "relu1_2" type: "ReLU" bottom: "conv1_2" top: "conv1_2" } layer { name: "pool1" type: "Pooling" bottom: "conv1_2" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2_1" type: "Convolution" bottom: "pool1" top: "conv2_1" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 } } layer { name: "relu2_1" type: "ReLU" bottom: "conv2_1" top: "conv2_1" } layer { name: "conv2_2" type: "Convolution" bottom: "conv2_1" top: "conv2_2" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 } } layer { name: "relu2_2" type: "ReLU" bottom: "conv2_2" top: "conv2_2" } layer { name: "pool2" type: "Pooling" bottom: "conv2_2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3_1" type: "Convolution" bottom: "pool2" top: "conv3_1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 } } layer { name: "relu3_1" type: "ReLU" bottom: "conv3_1" top: "conv3_1" } layer { name: "conv3_2" type: "Convolution" bottom: "conv3_1" top: "conv3_2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 } } layer { name: "relu3_2" type: "ReLU" bottom: "conv3_2" top: "conv3_2" } layer { name: "conv3_3" type: "Convolution" bottom: "conv3_2" top: "conv3_3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 } } layer { name: "relu3_3" type: "ReLU" bottom: "conv3_3" top: "conv3_3" } layer { name: "pool3" type: "Pooling" bottom: "conv3_3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4_1" type: "Convolution" bottom: "pool3" top: "conv4_1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu4_1" type: "ReLU" bottom: "conv4_1" top: "conv4_1" } layer { name: "conv4_2" type: "Convolution" bottom: "conv4_1" top: "conv4_2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu4_2" type: "ReLU" bottom: "conv4_2" top: "conv4_2" } layer { name: "conv4_3" type: "Convolution" bottom: "conv4_2" top: "conv4_3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu4_3" type: "ReLU" bottom: "conv4_3" top: "conv4_3" } layer { name: "pool4" type: "Pooling" bottom: "conv4_3" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv5_1" type: "Convolution" bottom: "pool4" top: "conv5_1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu5_1" type: "ReLU" bottom: "conv5_1" top: "conv5_1" } layer { name: "conv5_2" type: "Convolution" bottom: "conv5_1" top: "conv5_2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu5_2" type: "ReLU" bottom: "conv5_2" top: "conv5_2" } layer { name: "conv5_3" type: "Convolution" bottom: "conv5_2" top: "conv5_3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 } } layer { name: "relu5_3" type: "ReLU" bottom: "conv5_3" top: "conv5_3" }

#========= RPN ============ #========= RPN ============

prepare lstm inputs

layer { name: "im2col" bottom: "conv5_3" top: "im2col" type: "Im2col" convolution_param { pad: 1 kernel_size: 3 } } layer { name: "im2col_transpose" top: "im2col_transpose" bottom: "im2col" type: "Transpose" transpose_param { dim: 3 dim: 2 dim: 0 dim: 1 } } layer { name: "lstm_input" type: "Reshape" bottom: "im2col_transpose" top: "lstm_input" reshape_param { shape { dim: -1 } axis: 1 num_axes: 2 } }

layer { name: "lstm" type: "Lstm" bottom: "lstm_input" top: "lstm" lstm_param { num_output: 128 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" } clipping_threshold: 1 } }

===================== rlstm ===================

layer { name: "lstm-reverse1" type: "Reverse" bottom: "lstm_input" top: "rlstm_input" reverse_param { axis: 0 } } layer { name: "rlstm" type: "Lstm" bottom: "rlstm_input" top: "rlstm-output" lstm_param { num_output: 128 } } layer { name: "lstm-reverse2" type: "Reverse" bottom: "rlstm-output" top: "rlstm" reverse_param { axis: 0 } }

merge lstm and rlstm

layer { name: "merge_lstm_rlstm" type: "Concat" bottom: "lstm" bottom: "rlstm" top: "merge_lstm_rlstm" concat_param { axis: 2 } } layer { name: "lstm_output_reshape" type: "Reshape" bottom: "merge_lstm_rlstm" top: "lstm_output_reshape" reshape_param { shape { dim: -1 dim: 1 } axis: 1 num_axes: 1 } }

transpose size of output as (N, C, H, W)

layer { name: "lstm_output" type: "Transpose" bottom: "lstm_output_reshape" top: "lstm_output" transpose_param { dim: 2 dim: 3 dim: 1 dim: 0 } } layer { name: "fc" bottom: "lstm_output" top: "fc" type: "Convolution" convolution_param { num_output: 512 kernel_size: 1 } } layer { name: "relu_fc" type: "ReLU" bottom: "fc" top: "fc" }

layer { name: "rpn_cls_score" type: "Convolution" bottom: "fc" top: "rpn_cls_score" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 20 # 2(bg/fg) * 9(anchors) kernel_size: 1 pad: 0 stride: 1 } }

layer { name: "rpn_bbox_pred" type: "Convolution" bottom: "fc" top: "rpn_bbox_pred" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 20 # 4 * 9(anchors) kernel_size: 1 pad: 0 stride: 1 } }

layer { bottom: "rpn_cls_score" top: "rpn_cls_score_reshape" name: "rpn_cls_score_reshape" type: "Reshape" reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } } }

layer { name: 'rpn-data' type: 'Python' bottom: 'rpn_cls_score' bottom: 'gt_boxes' bottom: 'im_info' bottom: 'data' top: 'rpn_labels' top: 'rpn_bbox_targets' top: 'rpn_bbox_inside_weights' top: 'rpn_bbox_outside_weights' python_param { module: 'rpn.anchor_target_layer' layer: 'AnchorTargetLayer' param_str: "'feat_stride': 16" } }

layer { name: "rpn_loss_cls" type: "SoftmaxWithLoss" bottom: "rpn_cls_score_reshape" bottom: "rpn_labels" propagate_down: 1 propagate_down: 0 top: "rpn_cls_loss" loss_weight: 1 loss_param { ignore_label: -1 normalize: true } }

layer { name: "rpn_loss_bbox" type: "SmoothL1Loss" bottom: "rpn_bbox_pred" bottom: "rpn_bbox_targets" bottom: 'rpn_bbox_inside_weights' bottom: 'rpn_bbox_outside_weights' top: "rpn_loss_bbox" loss_weight: 1 smooth_l1_loss_param { sigma: 3.0 } } `

opened by dajiangxiaoyan 13
Implementation of Side-Refinement

Hi, TianZhi, I am trying to implement the training phase of CTPN with TensorFlow based on your code and faster-rcnn. However, I don't quietly understand the meaning of side-refinement or offset regression in your paper. The problems is I don't understand the meaning of x_{side} in your paper so I can't implement the training loss function. Can you release the code of side-refinement or tell me how to calculate x_{side}? It will help me understanding the meaning of your excellent idea. Many thanks!

opened by senliuy 10
"caffe.LayerParameter" has no field named "transpose_param"

I am trying to run the demo.py file and getting the following error. Is there something wrong that I am doing. I have the BVLC version of caffe also installed in a different location.

This is the error I am getting while trying to run demo.py

WARNING: Logging before InitGoogleLogging() is written to STDERR W0104 15:27:11.308679 30453 _caffe.cpp:154] DEPRECATION WARNING - deprecated use of Python interface W0104 15:27:11.308717 30453 _caffe.cpp:155] Use this instead (with the named "weights" parameter): W0104 15:27:11.308722 30453 _caffe.cpp:157] Net('models/deploy.prototxt', 1, weights='models/ctpn_trained_model.caffemodel') [libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 387:19: Message type "caffe.LayerParameter" has no field named "transpose_param". F0104 15:27:11.310125 30453 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/deploy.prototxt *** Check failure stack trace: *** Aborted (core dumped)

opened by mahaling 9
HOW to run this code

at first thank you tianzhi0549 for your great work I try to run this code but i cant ,and it seems that there are a lot of people who could run it so i need help please if any one can write the steps in clearer way or make video with the steps i will be very grateful for him and for example i am very confusing should i install caffe or the caffe in the CTPN folder is enough and another a lot of questions so if any one could help me with the steps i will be really grateful

opened by eslambakr 8
Error when compiling using cmake: contrastive_loss_layer.cpp: error: no matching function for call to 'max(float, double)'
I am trying to compile your version of Caffe and miserably have faced this issue that I don't seem to be able to solve it. I appreciate hints from anyone. Here's what I do:

Installed all dependencies

Changed Make.config file to use only cpu with python layer = 1

cmake --cmake -DCPU_ONLY=ON .. (And bellow is cmake configuration summary):

-- ******************* Caffe Configuration Summary ******************* -- General: -- Version : (Caffe doesn't declare its version in headers) -- Git : 1595004-dirty -- System : Linux -- C++ compiler : /usr/bin/c++ -- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized -- Debug CXX flags : -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized -- Build type : Release

-- BUILD_SHARED_LIBS : ON -- BUILD_python : ON -- BUILD_matlab : OFF -- BUILD_docs : ON -- CPU_ONLY : ON

-- Dependencies: -- BLAS : Yes (Atlas) -- Boost : Yes (ver. 1.62) -- glog : Yes -- gflags : Yes -- protobuf : Yes (ver. 3.0.0) -- lmdb : Yes (ver. 0.9.18) -- Snappy : Yes (ver. 1.1.3) -- LevelDB : Yes (ver. 1.18) -- OpenCV : Yes (ver. 2.4.9.1) -- CUDA : No

-- Python: -- Interpreter : /usr/bin/python2.7 (ver. 2.7.13) -- Libraries : /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.13) -- NumPy : /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.12.1)

-- Documentaion: -- Doxygen : No -- config_file :

-- Install: -- Install path : /data/CTPN/caffe/build/install

-- Configuring done -- Generating done -- Build files have been written to: /data/CTPN/caffe/build

make all -j4 (results in the following error):

[ 14%] Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/contrastive_loss_layer.cpp.o [ 14%] Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/conv_layer.cpp.o /home/CTPN/caffe/src/caffe/layers/contrastive_loss_layer.cpp: In instantiation of 'void caffe::ContrastiveLossLayer::Forward_cpu(const std::vector<caffe::Blob>&, const std::vector<caffe::Blob>&) [with Dtype = float]': /home/CTPN/caffe/src/caffe/layers/contrastive_loss_layer.cpp:118:1: required from here /home/CTPN/caffe/src/caffe/layers/contrastive_loss_layer.cpp:56:30: error: no matching function for call to 'max(float, double)' Dtype dist = std::max(margin - sqrt(dist_sq_.cpu_data()[i]), 0.0); ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I have managed and used original version of caffe. Although I faced all types of error during its compilation, except this one.

Many thanks in advance.
opened by aliko70 7
Is there a limit for number of letters to be present in the word in order to get it detected?

First of all thanks for sharing this project. I have tried with the project on the images of documents, it detect every text except for those word having letter 2 or less than 2.

Is there any option in the cfg.py which i could change in order to detect those too?

opened by vijay1131 7
about vertical text !

Hi there, recently I‘m going to do some tests on text detection，in natural image I googled a lot and found this great work ! I found that most detections are failure on vertical text after tried some images . how to train this network and is your dataset available for downloading and training ? thanks ! Best regards,

opened by goodtogood 6
from utils.cpu_nms import cpu_nms as nms ImportError: ./src/utils/cpu_nms.so: undefined symbol: PyFPE_jbuf

charan/CTPN$ python tools/demo.py Traceback (most recent call last): File "tools/demo.py", line 24, in from detectors import TextProposalDetector, TextDetector File "./src/detectors.py", line 4, in from utils.cpu_nms import cpu_nms as nms ImportError: ./src/utils/cpu_nms.so: undefined symbol: PyFPE_jbuf

Running in a virtual anaconda environment and i am using same python

opened by charan223 5
error:Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Crop, Data,

I could successfully build, but I am not able to run the demo.When I execute the demo(no-gpu), the error is as follows: Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, Lstm, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, ROIPooling, ReLU, Reduction, Reshape, Reverse, SPP, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, Transpose, WindowData) *** Check failure stack trace: *** Aborted (core dumped)

opened by striveallen 2
Can you share your dataset that mentioned in the article?

"Our model was trained on 3,000 natural images, including 229 images from the ICDAR 2013 training set. We collected the other images ourselves and manually labelled them with text line bounding boxes".

opened by hcnhatnam 1
How to generate training labels?

Hi, I read your paper and have some questions about training labels and side-refinement. About training labels, the paper says:

For text/non-text classification, a binary label is assigned to each positive(text) or negative(non-text) anchor. It is defined by computing the IoU overlap with the GT bounding box (divided by anchor location).

My first question is how to divide the GT bounding box? I think I need to divide the GT bounding box into many fine-scale bounding box (just like detect text) and then I can label anchors. According your paper I think I should divide the GT bounding box by anchor location. For a particular image and vgg-16 net, all anchors location is fixed because of the net architecture, so I need to divide the GT bounding box by these anchors?

Next question is how to compute the o* in side-refinement. If I divide the GT bounding box by anchor location, for the equation 4 in your paper, I think the xside is the right/left side of the GT bounding box, and cxa is the anchor that divide the GT bounding box. And for each GT bounding box I only need to compute the o for left/right side anchor. Is my understanding correct? Thank you in advance. @tianzhi0549

opened by zwenwang 0

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Related tags

Overview

Detecting Text in Natural Image with Connectionist Text Proposal Network

Required hardware

Required softwares

How to run this code

How to use other Caffe

License

Comments

prepare lstm inputs

===================== rlstm ===================

merge lstm and rlstm

transpose size of output as (N, C, H, W)

-- BUILD_SHARED_LIBS : ON -- BUILD_python : ON -- BUILD_matlab : OFF -- BUILD_docs : ON -- CPU_ONLY : ON

-- Dependencies: -- BLAS : Yes (Atlas) -- Boost : Yes (ver. 1.62) -- glog : Yes -- gflags : Yes -- protobuf : Yes (ver. 3.0.0) -- lmdb : Yes (ver. 0.9.18) -- Snappy : Yes (ver. 1.1.3) -- LevelDB : Yes (ver. 1.18) -- OpenCV : Yes (ver. 2.4.9.1) -- CUDA : No

-- Python: -- Interpreter : /usr/bin/python2.7 (ver. 2.7.13) -- Libraries : /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.13) -- NumPy : /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.12.1)

-- Documentaion: -- Doxygen : No -- config_file :

-- Install: -- Install path : /data/CTPN/caffe/build/install

Owner

Tian Zhi

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

A novel region proposal network for more general object detection ( including scene text detection ).

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Corner-based Region Proposal Network

governance proposal to make fei redeemable for eth

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

A simple python program to record security cam footage by detecting a face and body of a person in the frame.

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Recognizing cropped text in natural images.

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text