Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Ankush Gupta

Last update: Dec 28, 2022

Related tags

Computer Vision SynthText

Overview

SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Synthetic Scene-Text Image Samples

The code in the master branch is for Python2. Python3 is supported in the python3 branch.

The main dependencies are:

pygame, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy

Generating samples

python gen.py --viz [--datadir <path-to-dowloaded-renderer-data>]

where, --datadir points to the renderer_data directory included in the data torrent. Specifying this datadir is optional, and if not specified, the script will automatically download and extract the same renderer.tar.gz data file (~24 M). This data file includes:

sample.h5: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.
fonts: three sample fonts (add more fonts to this folder and then update fonts/fontlist.txt with their paths).
newsgroup: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside text_utils.py to see how the text inside this file is used by the renderer.
models/colors_new.cp: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.
models: Other cPickle files (char_freq.cp: frequency of each character in the text dataset; font_px2pt.cp: conversion from pt to px for various fonts: If you add a new font, make sure that the corresponding model is present in this file, if not you can add it by adapting invert_font_size.py).

This script will generate random scene-text image samples and store them in an h5 file in results/SynthText.h5. If the --viz option is specified, the generated output will be visualized as the script is being run; omit the --viz option to turn-off the visualizations. If you want to visualize the results stored in results/SynthText.h5 later, run:

python visualize_results.py

Pre-generated Dataset

A dataset with approximately 800000 synthetic scene-text images generated with this code can be found here.

Adding New Images

Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available here.

predict_depth.m MATLAB script to regress a depth mask for a given RGB image; uses the network of Liu etal. However, more recent works (e.g., this) might give better results.
run_ucm.m and floodFill.py for getting segmentation masks using gPb-UCM.

For an explanation of the fields in sample.h5 (e.g.: seg,area,label), please check this comment.

Pre-processed Background Images

The 8,000 background images used in the paper, along with their segmentation and depth masks, are included in the same torrent as the pre-generated dataset under the bg_data directory. The files are:

filenames	description
`imnames.cp`	names of images which do not contain background text
`bg_img.tar.gz`	images (filter these using `imnames.cp`)
`depth.h5`	depth maps
`seg.h5`	segmentation maps

Downloading without BitTorrent

Downloading with BitTorrent is strongly recommended. If that is not possible, the files are also available to download over http from https://thor.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>, where, <filename> can be:

filenames	size	md5 hash
`imnames.cp`	180K
`bg_img.tar.gz`	8.9G	3eac26af5f731792c9d95838a23b5047
`depth.h5`	15G	af97f6e6c9651af4efb7b1ff12a5dc1b
`seg.h5`	6.9G	1605f6e629b2524a3902a5ea729e86b2

Note: due to large size, depth.h5 is also available for download as 3-part split-files of 5G each. These part files are named: depth.h5-00, depth.h5-01, depth.h5-02. Download using the path above, and put them together using cat depth.h5-0* > depth.h5. To download, use the something like the following:

wget --continue https://thor.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>

use_preproc_bg.py provides sample code for reading this data.

Note: I do not own the copyright to these images.

Generating Samples with Text in non-Latin (English) Scripts

@JarveeLee has modified the pipeline for generating samples with Chinese text here.
@adavoudi has modified it for arabic/persian script, which flows from right-to-left here.
@MichalBusta has adapted it for a number of languages (e.g. Bangla, Arabic, Chinese, Japanese, Korean) here.
@gachiemchiep has adapted for Japanese here.
@gungui98 has adapted for Vietnamese here.
@youngkyung has adapted for Korean here.
@kotomiDu has developed an interactive UI for generating images with text here.
@LaJoKoch has adapted for German here.

Further Information

Please refer to the paper for more information, or contact me (email address in the paper).

Comments

How to make Vertical Text Dataset?

Hello, I'm used your code for generating OCR dataset in many case. Thanks for your efforts really :D

But, I want to make vertical text image..

https://github.com/ankush-me/SynthText/blob/master/text_utils.py#L165 this section would decide "text mask" rendering.. but It's hard to make vertical text data. At present, only horizontal characters are made.

on this line : https://github.com/ankush-me/SynthText/blob/master/text_utils.py#L219

original code :

newrect = font.get_rect(ch)
newrect.y = last_rect.y
if i > mid_idx:
newrect.topleft = (last_rect.topright[0]+2, newrect.topleft[1])
            else:
                newrect.topright = (last_rect.topleft[0]-2, newrect.topleft[1])
            newrect.centery = max(newrect.height, min(fsize[1] - newrect.height, newrect.centery + curve[i]))
            try:
                bbrect = font.render_to(surf, newrect, ch, rotation=rots[i])
            except ValueError:
                bbrect = font.render_to(surf, newrect, ch)
            bbrect.x = newrect.x + bbrect.x
            bbrect.y = newrect.y - bbrect.y
            bbs.append(np.array(bbrect))
            last_rect = newrect

my code :

newrect = font.get_rect(ch)
            # newrect.y = last_rect.y
            newrect.x = last_rect.x
            if i > mid_idx:
                # print(ch, " <- right")
                # newrect.topleft = (last_rect.topright[0] + 2, newrect.topleft[1])
                # newrect.topleft = (last_rect.topright[0], newrect.topleft[1] + 2)
                # newrect.topleft = (newrect.topright[0], last_rect.topleft[1] + 2)
                # newrect.topright = (newrect.bottomright[0], last_rect.bottomright[1] + 3)
                newrect.topleft = (newrect.bottomleft[0], last_rect.bottomleft[1] + 2)
            else:
                # print(ch, " <- left")
                # newrect.topright = (last_rect.topleft[0] - 2, newrect.topleft[1])
                # newrect.topright = (last_rect.topleft[0], newrect.topleft[1] - 2)
                # newrect.topright = (newrect.topright[0], last_rect.topleft[1] - 2)
                newrect.bottomright = (newrect.topright[0], last_rect.topright[1] - 2)
            newrect.centery = max(newrect.height, min(fsize[1] - newrect.height, newrect.centery + curve[i]))
            # newrect.centerx = max(newrect.width, min(fsize[0] - newrect.width, newrect.centerx + curve[i]))
            try:
                bbrect = font.render_to(surf, newrect, ch, rotation=rots[i])
            except ValueError:
                print("render error, without rotation..")
                bbrect = font.render_to(surf, newrect, ch)
            bbrect.x = newrect.x + bbrect.x
            bbrect.y = newrect.y - bbrect.y
            bbs.append(np.array(bbrect))
            last_rect = newrect

sample result : 2018-05-15 10 44 29 2018-05-15 10 44 36 2018-05-15 10 44 43

sample result with annotation : 2018-05-15 10 45 05 2018-05-15 10 45 12 2018-05-15 10 45 19

The possibilities are there, but they are not perfect. I really need your help..

Thanks.

opened by SSUHan 16

Text on generated images appears to be cut.

Text on generated images appears to be cut. I just cloned the repo, switched to branch python3, ran gen.py, and added a cv2.imwrite() inside visualize_results.py to save the generate images as an image. I am getting images like this

Someone please help

opened by ravan786 15
when i run the code , it generates the errors

when i use the python gen.py --viz in the command

it generates ~/SynthText-master$ python gen.py --viz Traceback (most recent call last): File "gen.py", line 19, in from synthgen import * File "/home/ubuntu/SynthText-master/synthgen.py", line 20, in import text_utils as tu File "/home/ubuntu/SynthText-master/text_utils.py", line 12, in from pygame import freetype ImportError: cannot import name freetype

waht is the wrong ?

opened by chrisjyw 15
How to generate gt.mat file for new dataset?
Hello! I am trying to create a new dataset using a different set of texts/words and was able to generate the image containing the new texts. However, I want to crop the individual texts using the code mentioned here: https://github.com/ankush-me/SynthText/issues/174 but it appears that it needs a "gt.mat" file. I tried to do this by adding a line of code within the visualize_results.py

viz_textbb(rgb, [charBB], wordBB) print (" image name : ", colorize(Color.RED, k, bold=True)) print (" ** no. of chars : ", colorize(Color.YELLOW, charBB.shape[-1])) print (" ** no. of words : ", colorize(Color.YELLOW, wordBB.shape[-1])) print (" ** text : ", colorize(Color.GREEN, txt)) gt_file = {"imnames": k, "wordBB": wordBB, "charBB": charBB, "txt": txt} ## added this

then eventually saving gt_file as a gt.mat file using scipy.io. However whenever i use this gt.mat to the code mentioned above to crop word patches, i get an error:

Traceback (most recent call last): File "crop.py", line 146, in <module> do_work(opts,synth_dat) File "crop.py", line 104, in do_work np.random.shuffle(i_range) File "mtrand.pyx", line 4529, in numpy.random.mtrand.RandomState.shuffle File "mtrand.pyx", line 4532, in numpy.random.mtrand.RandomState.shuffle TypeError: 'range' object does not support item assignment

I feel like the error is coming from my generated gt.mat because of its format. Would just like to ask how to properly create a gt.mat file that can be used as an input to the cropping code?

Thank you very much and thank you btw for this wonderful project. I've been learning it for weeks and love it so much!
opened by clairerity 13
How to represent the ground truth of the text ?

After you create your own pictures with text, How to represent the ground truth of the text on these pictures？What is your method? Is this form (x,y,w,h)？And Is the expression the same for the tilted text?

opened by lanyuelvyun 12
Images location

@ankush-me Hi, where are the 5 images (hiking, indian+musicians, sandwich, sea, village) that Synthtext script uses saved, from where the script load this images? Is there some file or? Ty

opened by Didier0 8
a bytes-like object is required, not 'str'

My tensorflow is 1.2 and python is 3.5 .I run your code and get this error: Traceback (most recent call last): File "gen.py", line 140, in main(args.viz) File "gen.py", line 95, in main RV3 = RendererV3(DATA_PATH,max_time=SECS_PER_IMG) File "/home/tian/tensorflow/example/SynthText/synthgen.py", line 368, in init self.text_renderer = tu.RenderFont(data_dir) File "/home/tian/tensorflow/example/SynthText/text_utils.py", line 108, in init self.font_state = FontState(data_dir) File "/home/tian/tensorflow/example/SynthText/text_utils.py", line 421, in init self.char_freq = cp.load(f) TypeError: a bytes-like object is required, not 'str'

Can you tell me how to solve it ?

opened by Tian14267 8
Can't add a new (my own) font in the script - PLZ IF SOMEONE KNOWS, IT'S URGENTLY FOR ME, Ty

Hi all and @ankush-me ,

can someone help me, I don't understand how to add my own font into script? I found in code fontlist.txt text source - a file that is used as text source. So there I put my .ttf fonts that I downloaded from here: https://www.1001fonts.com/text-fonts.html?page=1&fbclid=IwAR395plHPNmZemLoKmvpedbqRv0z8pUVU66np7LLWzia14GrAKWcZB1H5o4 and then terminal shows me and error :

'Fabiolo' is the name of font

When I use the upper one (fontlist1.txt - mine) there shows an error. When I use the bottom one (fontlist.txt - originally from SynthText script) that works fine everytime.

Here you can see how I add paths and names into fontlist1.txt I also add the font names like 'Fabiolo', 'Halida Sans', ... into Font Manager but still nothing, shows the same errors.

opened by Didier0 7

Is it a bug ? or just my understanding is wrong ?

In synthgen.py filter() method.

@staticmethod
def filter(seg,area,label):
    """
    Apply the filter.
    The final list is ranked by area.
    """
    good = label[area > TextRegions.minArea]
    area = area[area > TextRegions.minArea]
    filt,R = [],[]
    for idx,i in enumerate(good):
        mask = seg==i
        xs,ys = np.where(mask)

        coords = np.c_[xs,ys].astype('float32')
        rect = cv2.minAreaRect(coords)          
        box = np.array(cv2.cv.BoxPoints(rect))
        h,w,rot = TextRegions.get_hw(box,return_rot=True)

        f = (h > TextRegions.minHeight 
            and w > TextRegions.minWidth
            and TextRegions.minAspect < w/h < TextRegions.maxAspect
            and area[idx]/w*h > TextRegions.pArea)
        filt.append(f)
        R.append(rot)

` when it used cv2.minAreaRect(coords) , I think the coordinates should be the form of (index in width, index in height). But coords is get by using np.where(mask), so the coords actually is (index in height, index in width). So is this a bug ? And I just find the box four points can't encase the segment correctly? Or is it just ok or my understanding wrong ?

opened by DeepInSearch 7

Image '67/fruits_129_18.jpg' does not match its gt?

The index of '67/fruits_129_18.jpg' in the entire dataset is 82567. There are only 3 words in the image but the corresponding wordBB has shape (2,4,16), charBB has shape (2,4,56).

The md5 value has been checked which confirms the download is right.

opened by SakuraRiven 7
How to get the SynthText-image result?

hello,please help me,I was so trouble in the run demo. I am running python gen.py --viz I get ./result/SynthText.h5 But I cannot get the SynthText-image result. How to get the SynthText-image result? like this

opened by teresasun 7
Incorrect visualization of bboxes

I just downloaded this code and run the sample without any problems, however in the visualization popup window I get the following:

I just run the command python gen.py --viz I'm trying to see if it is just a visualization error but I can't find any information on the annotation labels under ./results/SynthText.h5, only the resulting image itself.

For example:

opened by DimTrigkakis 4
AssertionError and text placement parameters misunderstanding
Hi @ankush-me , I have this error and don't know what is incorrect? Is the AssertError important?

And can you explain what is this text placement parameters:

self.f_shrink = 0.90 self.max_shrink_trials = 5 # 0.9^5 ~= 0.6 self.p_flat = 0.10

because when I change it, nothing is changed with text on image
opened by Lane689 0
Saving masks in folder

Hi all and @ankush-me ,

while generating images I can save images with rendered text in folder but how can I save text masks in some other folder in .png, .jpg or some image format, not binary?

opened by Lane689 0
Alpha blending for better text visibility
Hello @ankush-me,

In poisson_reconstruct.py is the:

IM_TOP -> text instance
IM_BACK -> background image

There says:

combine images using poission editing. IM_TOP and IM_BACK should be of the same size.

But what images script combines?

Also in the poisson_reconstruct.py what is these gxs,gys,gxd,gyd? This is X and Y coordinates of something?

[gxs,gys] = get_grads(ims) [gxd,gyd] = get_grads(imd)

Is the alpha blend done here?

elif mode=='blend': # from recursive call: # just do an alpha blend gx = gxs+gxd gy = gys+gyd

When I replace 'blend' instead 'max' in functionblit_images(), nothing happens?
opened by Lane689 1

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Related tags

Overview

SynthText

Generating samples

Pre-generated Dataset

Adding New Images

Pre-processed Background Images

Downloading without BitTorrent

Generating Samples with Text in non-Latin (English) Scripts

Further Information

Comments

Owner

Ankush Gupta

A synthetic data generator for text recognition

A curated list of awesome synthetic data for text location and recognition

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition