Geometric Augmentation for Text Image

Overview

Text Image Augmentation

Build Status

A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provide the tool to avoid overfitting and gain robustness of text recognizers.

Note that this is a general toolkit. Please customize for your specific task. If the repo benefits your work, please cite the papers.

News

  • 2020-02 The paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition" was accepted to CVPR 2020. It is a preliminary attempt for smart augmentation.

  • 2019-11 The paper "Decoupled Attention Network for Text Recognition" (Paper Code) was accepted to AAAI 2020. This augmentation tool was used in the experiments of handwritten text recognition.

  • 2019-04 We applied this tool in the ReCTS competition of ICDAR 2019. Our ensemble model won the championship.

  • 2019-01 The similarity transformation was specifically customized for geomeric augmentation of text images.

Requirements

We recommend Anaconda to manage the version of your dependencies. For example:

     conda install boost=1.67.0

Installation

Build library:

    mkdir build
    cd build
    cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
    make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

    cp Augment.so ..
    cd ..
    python demo.py

Demo

  • Distortion

  • Stretch

  • Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset IIIT5K IC13 IC15
Without Data Augmentation 40.8% 6.8% 8.7%
With Data Augmentation 53.4% 9.6% 24.9%

Citation

@inproceedings{luo2020learn,
  author = {Canjie Luo and Yuanzhi Zhu and Lianwen Jin and Yongpan Wang},
  title = {Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition},
  booktitle = {CVPR},
  year = {2020}
}

@inproceedings{wang2020decoupled,
  author = {Tianwei Wang and Yuanzhi Zhu and Lianwen Jin and Canjie Luo and Xiaoxue Chen and Yaqiang Wu and Qianying Wang and Mingxiang Cai}, 
  title = {Decoupled attention network for text recognition}, 
  booktitle ={AAAI}, 
  year = {2020}
}

@article{schaefer2006image,
  title={Image deformation using moving least squares},
  author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
  journal={ACM Transactions on Graphics (TOG)},
  volume={25},
  number={3},
  pages={533--540},
  year={2006},
  publisher={ACM New York, NY, USA}
}

Acknowledgment

Thanks for the contribution of the following developers.

@keeofkoo

@cxcxcxcx

@Yati Sagade

Attention

The tool is only free for academic research purposes.

Comments
  • CMake fail

    CMake fail

    Thanks for your code, but when I compiled the code according to the readme.md, I meet the following error.

    Could NOT find PythonLibs (missing: PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS)

    default

    opened by huizhang0110 5
  • i got some trouble in 'make'

    i got some trouble in 'make'

    [ 12%] Building CXX object CMakeFiles/Augment.dir/src/conversion.cpp.o In file included from /home/fbas/下载/Scene-Text-Image-Transformer-master/src/conversion.cpp:1:0: /home/fbas/下载/Scene-Text-Image-Transformer-master/include/conversion.h:8:33: fatal error: numpy/ndarrayobject.h: 没有那个文件或目录 compilation terminated. CMakeFiles/Augment.dir/build.make:62: recipe for target 'CMakeFiles/Augment.dir/src/conversion.cpp.o' failed make[2]: *** [CMakeFiles/Augment.dir/src/conversion.cpp.o] Error 1 CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Augment.dir/all' failed make[1]: *** [CMakeFiles/Augment.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

    opened by Jinwanqi 4
  • floating point exception (core dumped) when process images with different size

    floating point exception (core dumped) when process images with different size

    If I resize all the images to the same size before transforming, there will be no error. By the way, all the images with height = 70 and width in (200, 400) , no very small size.

    opened by cuhk-hbsun 3
  • running into problems during make

    running into problems during make

    Cloned the repo and tried building it. Used the following command cmake .. -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF -DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") -DPYTHON_LIBRARY=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))")

    During make, getting this error log In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1809:0, from /usr/include/python2.7/numpy/ndarrayobject.h:18, from /y/x/Text-Image-Augmentation/include/conversion.h:8, from /y/x/Text-Image-Augmentation/src/conversion.cpp:1: /usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by "
    ^~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp:119:16: error: cannot declare variable 'g_numpyAllocator' to be of abstract type 'NumpyAllocator' NumpyAllocator g_numpyAllocator; ^~~~~~~~~~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp:64:7: note: because the following virtual functions are pure within 'NumpyAllocator': class NumpyAllocator : public MatAllocator ^~~~~~~~~~~~~~ In file included from /usr/include/opencv2/core.hpp:59:0, from /usr/include/opencv2/imgproc.hpp:46, from /usr/include/opencv2/imgproc/imgproc.hpp:48, from /y/x/Text-Image-Augmentation/include/conversion.h:5, from /y/x/Text-Image-Augmentation/src/conversion.cpp:1: /usr/include/opencv2/core/mat.hpp:417:23: note: virtual cv::UMatData* cv::MatAllocator::allocate(int, const int*, int, void*, size_t*, int, cv::UMatUsageFlags) const virtual UMatData* allocate(int dims, const int* sizes, int type, ^~~~~~~~ /usr/include/opencv2/core/mat.hpp:419:18: note: virtual bool cv::MatAllocator::allocate(cv::UMatData*, int, cv::UMatUsageFlags) const virtual bool allocate(UMatData* data, int accessflags, UMatUsageFlags usageFlags) const = 0; ^~~~~~~~ /usr/include/opencv2/core/mat.hpp:420:18: note: virtual void cv::MatAllocator::deallocate(cv::UMatData*) const virtual void deallocate(UMatData* data) const = 0; ^~~~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp: In member function 'cv::Mat NDArrayConverter::toMat(const PyObject*)': /y/x/Text-Image-Augmentation/src/conversion.cpp:202:11: error: 'class cv::Mat' has no member named 'refcount' m.refcount = refcountFromPyObject(o); ^~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp: In member function 'PyObject* NDArrayConverter::toNDArray(const cv::Mat&)': /y/x/Text-Image-Augmentation/src/conversion.cpp:223:12: error: 'class cv::Mat' has no member named 'refcount' if(!p->refcount || p->allocator != &g_numpyAllocator) ^~~~~~~~ /y/x/Text-Image-Augmentation/src/conversion.cpp:230:36: error: 'class cv::Mat' has no member named 'refcount' return pyObjectFromRefcount(p->refcount); ^~~~~~~~ CMakeFiles/Augment.dir/build.make:75: recipe for target 'CMakeFiles/Augment.dir/src/conversion.cpp.o' failed make[2]: *** [CMakeFiles/Augment.dir/src/conversion.cpp.o] Error 1 CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/Augment.dir/all' failed make[1]: *** [CMakeFiles/Augment.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2

    python version - 2.7.17 opencv - 3.3.0 numpy - 1.13.3

    opened by insomnyac1 2
  • About the agent updating and initialization

    About the agent updating and initialization

    I have two questions about the nice paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition":

    1.  In LIne 9 of the Algorithm 1, why the Agent network update towards -S'? I don't understand why -S' is a harder moving state.
    2. As for the agent initialization, what is the initialization direction of the 2*(N+1) fiducial points?
    
    opened by PkuDavidGuan 1
  • Why oldDotL are set by DstPoints ?

    Why oldDotL are set by DstPoints ?

    Could you help explain the following contradiction?

    According to the paper, w_k is defined with respect to the fiducial point(control point) p_k, and hence oldDotL should represent the fiducial point here: https://github.com/Canjie-Luo/Text-Image-Augmentation/blob/ab8e37a161ac5b3f7bc77962bbe5216e88118605/src/imgwarp_mls_similarity.cpp#L59-L60

    But instead oldDotL are set with the deformed positions: https://github.com/Canjie-Luo/Text-Image-Augmentation/blob/ab8e37a161ac5b3f7bc77962bbe5216e88118605/src/imgwarp_mls.cpp#L92-L98 https://github.com/Canjie-Luo/Text-Image-Augmentation/blob/ab8e37a161ac5b3f7bc77962bbe5216e88118605/src/Augment.cpp#L41-L53

    opened by huntzhan 1
  • undefined symbol: _ZN2cv6formatB5cxx11EPKcz

    undefined symbol: _ZN2cv6formatB5cxx11EPKcz

    作者您好! 我已经生成了Augment.so文件,在运行脚本时出现了这个错误,不知道作者知道可能是什么引起的吗?感谢! ImportError: /home/sun/sunny/projects/Decoupled-attention-network/Scene-Text-Image-Transformer/Augment.so: undefined symbol: _ZN2cv6formatB5cxx11EPKcz

    opened by Xiao-Ann 8
  • cmake problem

    cmake problem

    我使用的测试环境是ubuntu 16.04, 没有按照说明使用anaconda安装boost,结果可以编译,成功生成了Argument.so这个文件

    但是到服务器上Centos7.4, 使用同样的办法就不行了,我想如果不是boost1.67安装出了问题,那就是cmake Error

    boost 安装过程:

    1. down load boost_1_67_0.tar.gz
    2. extract file and cd it
    3. ./bootstrap.sh --with-libraries=all --with-python=/home/kongtianning/anaconda3/envs/python2712/bin/python --with-python-version=2.7 --with-python-root=/home/kongtianning/anaconda3/envs/python2712 --prefix=/home/kongtianning/myboost
    4. ./b2
    5. ./b2 install

    接下来我按照你说的做, 在ubuntu上面用系统自带的python2 没问题 但是在Centos上就不行 mkdir build cd build cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF .. 

    在Centos下cmake 命令我是这样用的

    cmake -DPYTHON_INCLUDE_DIR=/home/kongtianning/anaconda3/envs/python2712/include/python2.7 -DPYTHON_LIBRARY=/home/kongtianning/anaconda3/envs/python2712/lib/ -DPYTHON_EXECUTABLE=/home/kongtianning/anaconda3/envs/python2712/bin/python -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..

    结果返回是 找不到 boost_python

    CMake Error at /usr/share/cmake/Modules/FindBoost.cmake:1138 (message): Unable to find the requested Boost libraries.

    Boost version: 1.67.0

    Boost include path: /usr/local/include

    Could not find the following Boost libraries:

          boost_python
    

    No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the directory containing Boost libraries or BOOST_ROOT to the location of Boost. Call Stack (most recent call first): CMakeLists.txt:18 (find_package)

    -- Configuring incomplete, errors occurred! See also "/home/kongtianning/PycharmProjects/HanWangProJectPython/imageAugment/build/CMakeFiles/CMakeOutput.log".

    opened by tnkong 6
Owner
Canjie Luo
Canjie Luo
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 2, 2023
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 4, 2023
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

null 11.4k Jan 2, 2023
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 9, 2023
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
huoyijie 1.2k Dec 29, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 5, 2023
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

null 81 Jan 1, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023