Geometric Augmentation for Text Image

Canjie Luo

Last update: Jan 5, 2023

Related tags

Overview

Text Image Augmentation

A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provide the tool to avoid overfitting and gain robustness of text recognizers.

Note that this is a general toolkit. Please customize for your specific task. If the repo benefits your work, please cite the papers.

News

2020-02 The paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition" was accepted to CVPR 2020. It is a preliminary attempt for smart augmentation.
2019-11 The paper "Decoupled Attention Network for Text Recognition" (Paper Code) was accepted to AAAI 2020. This augmentation tool was used in the experiments of handwritten text recognition.
2019-04 We applied this tool in the ReCTS competition of ICDAR 2019. Our ensemble model won the championship.
2019-01 The similarity transformation was specifically customized for geomeric augmentation of text images.

Requirements

GCC 4.8.*
Python 2.7.*
Boost 1.67
OpenCV 2.4.*

We recommend Anaconda to manage the version of your dependencies. For example:

     conda install boost=1.67.0

Installation

Build library:

    mkdir build
    cd build
    cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
    make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

    cp Augment.so ..
    cd ..
    python demo.py

Demo

Distortion

Stretch

Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset	IIIT5K	IC13	IC15
Without Data Augmentation	40.8%	6.8%	8.7%
With Data Augmentation	53.4%	9.6%	24.9%

Citation

@inproceedings{luo2020learn,
  author = {Canjie Luo and Yuanzhi Zhu and Lianwen Jin and Yongpan Wang},
  title = {Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition},
  booktitle = {CVPR},
  year = {2020}
}

@inproceedings{wang2020decoupled,
  author = {Tianwei Wang and Yuanzhi Zhu and Lianwen Jin and Canjie Luo and Xiaoxue Chen and Yaqiang Wu and Qianying Wang and Mingxiang Cai}, 
  title = {Decoupled attention network for text recognition}, 
  booktitle ={AAAI}, 
  year = {2020}
}

@article{schaefer2006image,
  title={Image deformation using moving least squares},
  author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
  journal={ACM Transactions on Graphics (TOG)},
  volume={25},
  number={3},
  pages={533--540},
  year={2006},
  publisher={ACM New York, NY, USA}
}

Acknowledgment

Thanks for the contribution of the following developers.

@keeofkoo

@cxcxcxcx

@Yati Sagade

Attention

The tool is only free for academic research purposes.

Comments

CMake fail

Thanks for your code, but when I compiled the code according to the readme.md, I meet the following error.

Could NOT find PythonLibs (missing: PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS)

opened by huizhang0110 5
i got some trouble in 'make'

[ 12%] Building CXX object CMakeFiles/Augment.dir/src/conversion.cpp.o In file included from /home/fbas/下载/Scene-Text-Image-Transformer-master/src/conversion.cpp:1:0: /home/fbas/下载/Scene-Text-Image-Transformer-master/include/conversion.h:8:33: fatal error: numpy/ndarrayobject.h: 没有那个文件或目录 compilation terminated. CMakeFiles/Augment.dir/build.make:62: recipe for target 'CMakeFiles/Augment.dir/src/conversion.cpp.o' failed make[2]: *** [CMakeFiles/Augment.dir/src/conversion.cpp.o] Error 1 CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Augment.dir/all' failed make[1]: *** [CMakeFiles/Augment.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

opened by Jinwanqi 4
floating point exception (core dumped) when process images with different size

If I resize all the images to the same size before transforming, there will be no error. By the way, all the images with height = 70 and width in (200, 400) , no very small size.

opened by cuhk-hbsun 3
running into problems during make

Cloned the repo and tried building it. Used the following command cmake .. -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF -DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") -DPYTHON_LIBRARY=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))")

During make, getting this error log In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1809:0, from /usr/include/python2.7/numpy/ndarrayobject.h:18, from /y/x/Text-Image-Augmentation/include/conversion.h:8, from /y/x/Text-Image-Augmentation/src/conversion.cpp:1: /usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by "
^~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp:119:16: error: cannot declare variable 'g_numpyAllocator' to be of abstract type 'NumpyAllocator' NumpyAllocator g_numpyAllocator; ^~~~~~~~~~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp:64:7: note: because the following virtual functions are pure within 'NumpyAllocator': class NumpyAllocator : public MatAllocator ^~~~~~~~~~~~~~ In file included from /usr/include/opencv2/core.hpp:59:0, from /usr/include/opencv2/imgproc.hpp:46, from /usr/include/opencv2/imgproc/imgproc.hpp:48, from /y/x/Text-Image-Augmentation/include/conversion.h:5, from /y/x/Text-Image-Augmentation/src/conversion.cpp:1: /usr/include/opencv2/core/mat.hpp:417:23: note: virtual cv::UMatData* cv::MatAllocator::allocate(int, const int*, int, void*, size_t*, int, cv::UMatUsageFlags) const virtual UMatData* allocate(int dims, const int* sizes, int type, ^~~~~~~~ /usr/include/opencv2/core/mat.hpp:419:18: note: virtual bool cv::MatAllocator::allocate(cv::UMatData*, int, cv::UMatUsageFlags) const virtual bool allocate(UMatData* data, int accessflags, UMatUsageFlags usageFlags) const = 0; ^~~~~~~~ /usr/include/opencv2/core/mat.hpp:420:18: note: virtual void cv::MatAllocator::deallocate(cv::UMatData*) const virtual void deallocate(UMatData* data) const = 0; ^~~~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp: In member function 'cv::Mat NDArrayConverter::toMat(const PyObject*)': /y/x/Text-Image-Augmentation/src/conversion.cpp:202:11: error: 'class cv::Mat' has no member named 'refcount' m.refcount = refcountFromPyObject(o); ^~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp: In member function 'PyObject* NDArrayConverter::toNDArray(const cv::Mat&)': /y/x/Text-Image-Augmentation/src/conversion.cpp:223:12: error: 'class cv::Mat' has no member named 'refcount' if(!p->refcount || p->allocator != &g_numpyAllocator) ^~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp:230:36: error: 'class cv::Mat' has no member named 'refcount' return pyObjectFromRefcount(p->refcount); ^~~~~~~~ CMakeFiles/Augment.dir/build.make:75: recipe for target 'CMakeFiles/Augment.dir/src/conversion.cpp.o' failed make[2]: *** [CMakeFiles/Augment.dir/src/conversion.cpp.o] Error 1 CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/Augment.dir/all' failed make[1]: *** [CMakeFiles/Augment.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

python version - 2.7.17 opencv - 3.3.0 numpy - 1.13.3

opened by insomnyac1 2

About the agent updating and initialization

I have two questions about the nice paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition":

1.  In LIne 9 of the Algorithm 1, why the Agent network update towards -S'? I don't understand why -S' is a harder moving state.
2. As for the agent initialization, what is the initialization direction of the 2*(N+1) fiducial points?

opened by PkuDavidGuan 1

Why oldDotL are set by DstPoints ?

Could you help explain the following contradiction?

According to the paper, w_k is defined with respect to the fiducial point(control point) p_k, and hence oldDotL should represent the fiducial point here: https://github.com/Canjie-Luo/Text-Image-Augmentation/blob/ab8e37a161ac5b3f7bc77962bbe5216e88118605/src/imgwarp_mls_similarity.cpp#L59-L60

But instead oldDotL are set with the deformed positions: https://github.com/Canjie-Luo/Text-Image-Augmentation/blob/ab8e37a161ac5b3f7bc77962bbe5216e88118605/src/imgwarp_mls.cpp#L92-L98 https://github.com/Canjie-Luo/Text-Image-Augmentation/blob/ab8e37a161ac5b3f7bc77962bbe5216e88118605/src/Augment.cpp#L41-L53

opened by huntzhan 1
undefined symbol: _ZN2cv6formatB5cxx11EPKcz

作者您好！我已经生成了Augment.so文件，在运行脚本时出现了这个错误，不知道作者知道可能是什么引起的吗？感谢！ ImportError: /home/sun/sunny/projects/Decoupled-attention-network/Scene-Text-Image-Transformer/Augment.so: undefined symbol: _ZN2cv6formatB5cxx11EPKcz

opened by Xiao-Ann 8
cmake problem
我使用的测试环境是ubuntu 16.04, 没有按照说明使用anaconda安装boost,结果可以编译，成功生成了Argument.so这个文件

但是到服务器上Centos7.4, 使用同样的办法就不行了，我想如果不是boost1.67安装出了问题，那就是cmake　Ｅｒｒｏｒ

boost 安装过程:

down load boost_1_67_0.tar.gz

extract file and cd it

./bootstrap.sh --with-libraries=all --with-python=/home/kongtianning/anaconda3/envs/python2712/bin/python --with-python-version=2.7 --with-python-root=/home/kongtianning/anaconda3/envs/python2712 --prefix=/home/kongtianning/myboost

./b2

./b2 install

接下来我按照你说的做, 在ubuntu上面用系统自带的python2 没问题　但是在Centos上就不行 mkdir build cd build cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..　

在Ｃｅｎｔｏｓ下cmake 命令我是这样用的

cmake -DPYTHON_INCLUDE_DIR=/home/kongtianning/anaconda3/envs/python2712/include/python2.7 -DPYTHON_LIBRARY=/home/kongtianning/anaconda3/envs/python2712/lib/ -DPYTHON_EXECUTABLE=/home/kongtianning/anaconda3/envs/python2712/bin/python -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..

结果返回是　找不到 boost_python

CMake Error at /usr/share/cmake/Modules/FindBoost.cmake:1138 (message): Unable to find the requested Boost libraries.

Boost version: 1.67.0

Boost include path: /usr/local/include

Could not find the following Boost libraries:

boost_python

No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the directory containing Boost libraries or BOOST_ROOT to the location of Boost. Call Stack (most recent call first): CMakeLists.txt:18 (find_package)

-- Configuring incomplete, errors occurred! See also "/home/kongtianning/PycharmProjects/HanWangProJectPython/imageAugment/build/CMakeFiles/CMakeOutput.log".
opened by tnkong 6

Geometric Augmentation for Text Image

Related tags

Overview

Text Image Augmentation

News

Requirements

Installation

Demo

Speed

Improvement for Recognition

Citation

Acknowledgment

Attention

Comments

Owner

Canjie Luo

Image augmentation for machine learning experiments.

Image augmentation library in Python for machine learning.

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

OCR system for Arabic language that converts images of typed text to machine-encoded text.

OCR, Scene-Text-Understanding, Text Recognition

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.