Generate text images for training deep learning ocr model

Overview

New version release:https://github.com/oh-my-ocr/text_renderer

Text Renderer

Generate text images for training deep learning OCR model (e.g. CRNN). Support both latin and non-latin text.

Setup

  • Ubuntu 16.04
  • python 3.5+

Install dependencies:

pip3 install -r requirements.txt

Demo

By default, simply run python3 main.py will generate 20 text images and a labels.txt file in output/default/.

example1.jpg example2.jpg

example3.jpg example4.jpg

Use your own data to generate image

  1. Please run python3 main.py --help to see all optional arguments and their meanings. And put your own data in corresponding folder.

  2. Config text effects and fraction in configs/default.yaml file(or create a new config file and use it by --config_file option), here are some examples:

Effect name Image
Origin(Font size 25) origin
Perspective Transform perspective
Random Crop rand_crop
Curve curve
Light border light border
Dark border dark border
Random char space big random char space big
Random char space small random char space small
Middle line middle line
Table line table line
Under line under line
Emboss emboss
Reverse color reverse color
Blur blur
Text color font_color
Line color line_color
  1. Run main.py file.

Strict mode

For no-latin language(e.g Chinese), it's very common that some fonts only support limited chars. In this case, you will get bad results like these:

bad_example1

bad_example2

bad_example3

Select fonts that support all chars in --chars_file is annoying. Run main.py with --strict option, renderer will retry get text from corpus during generate processing until all chars are supported by a font.

Tools

You can use check_font.py script to check how many chars your font not support in --chars_file:

python3 tools/check_font.py

checking font ./data/fonts/eng/Hack-Regular.ttf
chars not supported(4971):
['', '', '广', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '','', '', '', ''...]
0 fonts support all chars(5071) in ./data/chars/chn.txt:
[]

Generate image using GPU

If you want to use GPU to make generate image faster, first compile opencv with CUDA. Compiling OpenCV with CUDA support

Then build Cython part, and add --gpu option when run main.py

cd libs/gpu
python3 setup.py build_ext --inplace

Debug mode

Run python3 main.py --debug will save images with extract information. You can see how perspectiveTransform works and all bounding/rotated boxes.

debug_demo

Todo

See https://github.com/Sanster/text_renderer/projects/1

Citing text_renderer

If you use text_renderer in your research, please consider use the following BibTeX entry.

@misc{text_renderer,
  author =       {weiqing.chu},
  title =        {text_renderer},
  howpublished = {\url{https://github.com/Sanster/text_renderer}},
  year =         {2021}
}
Comments
  • 标签文件中的文字改成汉字的索引

    标签文件中的文字改成汉字的索引

    默认生成的tmp_labels.txt标签的内容是 ‘ 00000001 命形式原始得令人吃惊 ’ ,这种形式的,现在想把它改成 ‘ 44955828_2248996261.jpg 29 403 2 172 586 167 10 172 110 121 ’ 这种形式的,后面的10个整数对应其在 char_std_5990.txt (5990个汉字字符,网上下载的)中的索引,如果数据量较大的话,重新写个脚本进行转换应该比较耗时,能否直接修改作者的代码进行转换,如果可以,请指教,谢谢!

    opened by wqt2019 5
  • python main.py

    python main.py

      你好,打扰了。我在直接运行python main.py 会报错:glob.glob(fonts_dir + '//*',recursive=True),提示多了recursive参数。删去recursive参数能运行,但是会在加载font和txt文件的时候报错.删去 " / "后能正确加载文件(以上代码所在行数分别是 text_renderer/libs/font_utils.py/line18和 text_renderer/textrenderer/corpus.py/line20),再次运行python main.py时一直反馈异常,输出 Retry gen_image(异常所在main.py中line71,应该是render.py的gen_image()函数执行出错)我是在ubuntu14和ubuntu18里面跑的,难道只能在ubuntu16里跑么?   谢谢了。       

    opened by caoyangcr7 5
  • Provide better GPU building support

    Provide better GPU building support

    The previous GPU related building is not working on Windows. There's no pkg-config for OpenCV on Windows.

    I do these change:

    • Replace pkg-config based opencv config with cmake based, which is simply implemented in setup.py
    • Support both Linux/Mac & Windows, i.e. G++/Clang++ for Linux/Mac, Visual Studio for Windows
    • Give precise steps to take for building, in build-gpu-libs.md

    Hope these changes help Windows users.

    opened by zchrissirhcz 3
  •   File

    File "main.py", line 75, in gen_img_retry

    Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 58, in apply_uniform_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2) Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 44, in apply_gauss_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2) Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 69, in apply_sp_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2) Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 69, in apply_sp_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2) Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 58, in apply_uniform_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2) Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 69, in apply_sp_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2) Retry gen_img: not enough values to unpack (expected 3, got 2) Traceback (most recent call last): File "main.py", line 75, in gen_img_retry return renderer.gen_img(img_index) File "/opt/shakey/synetic/text_renderer/textrenderer/renderer.py", line 94, in gen_img word_img = self.noiser.apply(word_img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 38, in apply return noise_func(img) File "/opt/shakey/synetic/text_renderer/textrenderer/noiser.py", line 58, in apply_uniform_noise row, col, channel = img.shape ValueError: not enough values to unpack (expected 3, got 2)

    opened by cuimiao187561 2
  • 作者你好,我想请问一下,从语料库中生成需要的训练集时,如何保证我的数据集是平衡的。举个栗子说,我如何确保每个字符的出现的频率大致是相同的,如果数据不均匀,对最终的模型会有多大的影响?

    作者你好,我想请问一下,从语料库中生成需要的训练集时,如何保证我的数据集是平衡的。举个栗子说,我如何确保每个字符的出现的频率大致是相同的,如果数据不均匀,对最终的模型会有多大的影响?

    语料只要以 txt(utf8) 的格式放在 --corpus_dir 目录下就可以了,会递归地去加载。注意使用不同的语料时 --corpus_mode 要对应,比如中文语料要对应 --corpus_mode=chn

    要兼顾中英文和数字,可以分三次生成数据(--output_dir指向同一个目录)。以生成 500w 数据为例,可以 420w 从中文语料里生成,60w 万从英文语料,20w 随机数字生成,因为中文的字符集最大,所以占的比例肯定要最高,但具体怎么样比例最优我也不确定...

    几个语料库的参考链接:

    • 中文语料库: http://www.sogou.com/labs/resource/cs.php
    • wiki 中英文语料库:https://dumps.wikimedia.org/enwiki/20180720/

    Originally posted by @Sanster in https://github.com/Sanster/text_renderer/issues/6#issuecomment-408352902

    opened by yugengde 2
  • 虚拟机ubuntu16生成样本卡住不动

    虚拟机ubuntu16生成样本卡住不动

    我在虚拟机ubuntu16.04下,想测试下。直接运行python3 main.py会卡住不动,图像也没有生成。

    laoma@ubuntu:~/text_renderer-master$ python3 main.py Load fonts from /home/laoma/text_renderer-master/data/fonts/chn Total fonts num: 1 Background num: 1 Loading corpus from: ./data/corpus Loading chn corpus: 1/1 Generate text images in ./output/default 这里就卡住不动了,是环境没配置好吗?

    opened by hbulaoma 2
  • BUG: Why filter out word has less than 2 character in eng corpus ?

    BUG: Why filter out word has less than 2 character in eng corpus ?

    I found that this line

    if word != u'' and len(word) > 2:
        self.corpus.append(word)
    

    can filter out many important words like "is", "no", ... I'll submit an Pull requests to fix

    opened by Luvata 1
  • Bump pyyaml from 5.1 to 5.4

    Bump pyyaml from 5.1 to 5.4

    Bumps pyyaml from 5.1 to 5.4.

    Changelog

    Sourced from pyyaml's changelog.

    5.4 (2021-01-19)

    5.3.1 (2020-03-18)

    • yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

    5.3 (2020-01-06)

    5.2 (2019-12-02)

    • Repair incompatibilities introduced with 5.1. The default Loader was changed, but several methods like add_constructor still used the old default yaml/pyyaml#279 -- A more flexible fix for custom tag constructors yaml/pyyaml#287 -- Change default loader for yaml.add_constructor yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
    • Make FullLoader safer by removing python/object/apply from the default FullLoader yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
    • Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff yaml/pyyaml#276 -- Fix logic for quoting special characters
    • Other PRs: yaml/pyyaml#280 -- Update CHANGES for 5.1

    5.1.2 (2019-07-30)

    • Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+

    ... (truncated)

    Commits
    • 58d0cb7 5.4 release
    • a60f7a1 Fix compatibility with Jython
    • ee98abd Run CI on PR base branch changes
    • ddf2033 constructor.timezone: _copy & deepcopy
    • fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers
    • a001f27 Fix for CVE-2020-14343
    • fe15062 Add 3.9 to appveyor file for completeness sake
    • 1e1c7fb Add a newline character to end of pyproject.toml
    • 0b6b7d6 Start sentences and phrases for capital letters
    • c976915 Shell code improvements
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • 运行 train.py 有个resize 报错

    运行 train.py 有个resize 报错

    File "/data/SSD/text_renderer-master/textrenderer/renderer.py", line 208, in crop_img dst = cv2.resize(dst, (dst_width, self.out_height), interpolation=cv2.INTER_CUBIC) cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-p6arhee9/opencv/modules/imgproc/src/resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'resize' 可以正常跑完,但不知道数据是否会乱?

    opened by XHQC 0
Owner
Qing
Don't Panic. Here is your towel.
Qing
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 5, 2023
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 6, 2021
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 3, 2023
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

Deep Insight 99 Nov 1, 2022
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

doc2text doc2text extracts higher quality text by fixing common scan errors Developing text corpora can be a massive pain in the butt. Much of the tex

Joe Sutherland 1.3k Jan 4, 2023
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 3, 2023
Use Youdao OCR API to covert your clipboard image to text.

Alfred Clipboard OCR 注:本仓库基于 oott123/alfred-clipboard-ocr 的逻辑用 Python 重写,换用了有道 AI 的 API,准确率更高,有效防止百度导致隐私泄露等问题,并且有道 AI 初始提供的 50 元体验金对于其资费而言个人用户基本可以永久使用

Junlin Liu 6 Sep 19, 2022