Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Overview

SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Synthetic Scene-Text Image Samples Synthetic Scene-Text Samples

The code in the master branch is for Python2. Python3 is supported in the python3 branch.

The main dependencies are:

pygame, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy

Generating samples

python gen.py --viz [--datadir <path-to-dowloaded-renderer-data>]

where, --datadir points to the renderer_data directory included in the data torrent. Specifying this datadir is optional, and if not specified, the script will automatically download and extract the same renderer.tar.gz data file (~24 M). This data file includes:

  • sample.h5: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.
  • fonts: three sample fonts (add more fonts to this folder and then update fonts/fontlist.txt with their paths).
  • newsgroup: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside text_utils.py to see how the text inside this file is used by the renderer.
  • models/colors_new.cp: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.
  • models: Other cPickle files (char_freq.cp: frequency of each character in the text dataset; font_px2pt.cp: conversion from pt to px for various fonts: If you add a new font, make sure that the corresponding model is present in this file, if not you can add it by adapting invert_font_size.py).

This script will generate random scene-text image samples and store them in an h5 file in results/SynthText.h5. If the --viz option is specified, the generated output will be visualized as the script is being run; omit the --viz option to turn-off the visualizations. If you want to visualize the results stored in results/SynthText.h5 later, run:

python visualize_results.py

Pre-generated Dataset

A dataset with approximately 800000 synthetic scene-text images generated with this code can be found here.

Adding New Images

Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available here.

  • predict_depth.m MATLAB script to regress a depth mask for a given RGB image; uses the network of Liu etal. However, more recent works (e.g., this) might give better results.
  • run_ucm.m and floodFill.py for getting segmentation masks using gPb-UCM.

For an explanation of the fields in sample.h5 (e.g.: seg,area,label), please check this comment.

Pre-processed Background Images

The 8,000 background images used in the paper, along with their segmentation and depth masks, are included in the same torrent as the pre-generated dataset under the bg_data directory. The files are:

filenames description
imnames.cp names of images which do not contain background text
bg_img.tar.gz images (filter these using imnames.cp)
depth.h5 depth maps
seg.h5 segmentation maps

Downloading without BitTorrent

Downloading with BitTorrent is strongly recommended. If that is not possible, the files are also available to download over http from https://thor.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>, where, <filename> can be:

filenames size md5 hash
imnames.cp 180K
bg_img.tar.gz 8.9G 3eac26af5f731792c9d95838a23b5047
depth.h5 15G af97f6e6c9651af4efb7b1ff12a5dc1b
seg.h5 6.9G 1605f6e629b2524a3902a5ea729e86b2

Note: due to large size, depth.h5 is also available for download as 3-part split-files of 5G each. These part files are named: depth.h5-00, depth.h5-01, depth.h5-02. Download using the path above, and put them together using cat depth.h5-0* > depth.h5. To download, use the something like the following:

wget --continue https://thor.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>

use_preproc_bg.py provides sample code for reading this data.

Note: I do not own the copyright to these images.

Generating Samples with Text in non-Latin (English) Scripts

  • @JarveeLee has modified the pipeline for generating samples with Chinese text here.
  • @adavoudi has modified it for arabic/persian script, which flows from right-to-left here.
  • @MichalBusta has adapted it for a number of languages (e.g. Bangla, Arabic, Chinese, Japanese, Korean) here.
  • @gachiemchiep has adapted for Japanese here.
  • @gungui98 has adapted for Vietnamese here.
  • @youngkyung has adapted for Korean here.
  • @kotomiDu has developed an interactive UI for generating images with text here.
  • @LaJoKoch has adapted for German here.

Further Information

Please refer to the paper for more information, or contact me (email address in the paper).

Comments
  • How to make Vertical Text Dataset?

    How to make Vertical Text Dataset?

    Hello, I'm used your code for generating OCR dataset in many case. Thanks for your efforts really :D

    But, I want to make vertical text image..

    https://github.com/ankush-me/SynthText/blob/master/text_utils.py#L165 this section would decide "text mask" rendering.. but It's hard to make vertical text data. At present, only horizontal characters are made.

    on this line : https://github.com/ankush-me/SynthText/blob/master/text_utils.py#L219

    original code :

    newrect = font.get_rect(ch)
    newrect.y = last_rect.y
    if i > mid_idx:
    newrect.topleft = (last_rect.topright[0]+2, newrect.topleft[1])
                else:
                    newrect.topright = (last_rect.topleft[0]-2, newrect.topleft[1])
                newrect.centery = max(newrect.height, min(fsize[1] - newrect.height, newrect.centery + curve[i]))
                try:
                    bbrect = font.render_to(surf, newrect, ch, rotation=rots[i])
                except ValueError:
                    bbrect = font.render_to(surf, newrect, ch)
                bbrect.x = newrect.x + bbrect.x
                bbrect.y = newrect.y - bbrect.y
                bbs.append(np.array(bbrect))
                last_rect = newrect
    

    my code :

    newrect = font.get_rect(ch)
                # newrect.y = last_rect.y
                newrect.x = last_rect.x
                if i > mid_idx:
                    # print(ch, " <- right")
                    # newrect.topleft = (last_rect.topright[0] + 2, newrect.topleft[1])
                    # newrect.topleft = (last_rect.topright[0], newrect.topleft[1] + 2)
                    # newrect.topleft = (newrect.topright[0], last_rect.topleft[1] + 2)
                    # newrect.topright = (newrect.bottomright[0], last_rect.bottomright[1] + 3)
                    newrect.topleft = (newrect.bottomleft[0], last_rect.bottomleft[1] + 2)
                else:
                    # print(ch, " <- left")
                    # newrect.topright = (last_rect.topleft[0] - 2, newrect.topleft[1])
                    # newrect.topright = (last_rect.topleft[0], newrect.topleft[1] - 2)
                    # newrect.topright = (newrect.topright[0], last_rect.topleft[1] - 2)
                    newrect.bottomright = (newrect.topright[0], last_rect.topright[1] - 2)
                newrect.centery = max(newrect.height, min(fsize[1] - newrect.height, newrect.centery + curve[i]))
                # newrect.centerx = max(newrect.width, min(fsize[0] - newrect.width, newrect.centerx + curve[i]))
                try:
                    bbrect = font.render_to(surf, newrect, ch, rotation=rots[i])
                except ValueError:
                    print("render error, without rotation..")
                    bbrect = font.render_to(surf, newrect, ch)
                bbrect.x = newrect.x + bbrect.x
                bbrect.y = newrect.y - bbrect.y
                bbs.append(np.array(bbrect))
                last_rect = newrect
    

    sample result : 2018-05-15 10 44 29 2018-05-15 10 44 36 2018-05-15 10 44 43

    sample result with annotation : 2018-05-15 10 45 05 2018-05-15 10 45 12 2018-05-15 10 45 19

    The possibilities are there, but they are not perfect. I really need your help..

    Thanks.

    opened by SSUHan 16
  • Text on generated images appears to be cut.

    Text on generated images appears to be cut.

    Text on generated images appears to be cut. I just cloned the repo, switched to branch python3, ran gen.py, and added a cv2.imwrite() inside visualize_results.py to save the generate images as an image. I am getting images like this sea_15 sandwich_96 indian_musician hiking_125

    Someone please help

    opened by ravan786 15
  • when i run the code , it generates the errors

    when i run the code , it generates the errors

    when i use the python gen.py --viz in the command

    it generates ~/SynthText-master$ python gen.py --viz Traceback (most recent call last): File "gen.py", line 19, in from synthgen import * File "/home/ubuntu/SynthText-master/synthgen.py", line 20, in import text_utils as tu File "/home/ubuntu/SynthText-master/text_utils.py", line 12, in from pygame import freetype ImportError: cannot import name freetype

    waht is the wrong ?

    opened by chrisjyw 15
  • How to generate gt.mat file for new dataset?

    How to generate gt.mat file for new dataset?

    Hello! I am trying to create a new dataset using a different set of texts/words and was able to generate the image containing the new texts. However, I want to crop the individual texts using the code mentioned here: https://github.com/ankush-me/SynthText/issues/174 but it appears that it needs a "gt.mat" file. I tried to do this by adding a line of code within the visualize_results.py

    viz_textbb(rgb, [charBB], wordBB)
    print ("     image name   : ", colorize(Color.RED, k, bold=True))
    print ("  ** no. of chars : ", colorize(Color.YELLOW, charBB.shape[-1]))
    print ("  ** no. of words : ", colorize(Color.YELLOW, wordBB.shape[-1]))
    print ("  ** text         : ", colorize(Color.GREEN, txt))
    gt_file = {"imnames": k, "wordBB": wordBB, "charBB": charBB, "txt": txt} ## added this
    

    then eventually saving gt_file as a gt.mat file using scipy.io. However whenever i use this gt.mat to the code mentioned above to crop word patches, i get an error:

    Traceback (most recent call last):
      File "crop.py", line 146, in <module>
        do_work(opts,synth_dat)
      File "crop.py", line 104, in do_work
        np.random.shuffle(i_range)
      File "mtrand.pyx", line 4529, in numpy.random.mtrand.RandomState.shuffle
      File "mtrand.pyx", line 4532, in numpy.random.mtrand.RandomState.shuffle
    TypeError: 'range' object does not support item assignment
    

    I feel like the error is coming from my generated gt.mat because of its format. Would just like to ask how to properly create a gt.mat file that can be used as an input to the cropping code?

    Thank you very much and thank you btw for this wonderful project. I've been learning it for weeks and love it so much!

    opened by clairerity 13
  • How to represent the ground truth of the text ?

    How to represent the ground truth of the text ?

    After you create your own pictures with text, How to represent the ground truth of the text on these pictures?What is your method? Is this form (x,y,w,h)?And Is the expression the same for the tilted text?

    opened by lanyuelvyun 12
  • Images location

    Images location

    @ankush-me Hi, where are the 5 images (hiking, indian+musicians, sandwich, sea, village) that Synthtext script uses saved, from where the script load this images? Is there some file or? Ty

    opened by Didier0 8
  • a bytes-like object is required, not 'str'

    a bytes-like object is required, not 'str'

    My tensorflow is 1.2 and python is 3.5 .I run your code and get this error: Traceback (most recent call last): File "gen.py", line 140, in main(args.viz) File "gen.py", line 95, in main RV3 = RendererV3(DATA_PATH,max_time=SECS_PER_IMG) File "/home/tian/tensorflow/example/SynthText/synthgen.py", line 368, in init self.text_renderer = tu.RenderFont(data_dir) File "/home/tian/tensorflow/example/SynthText/text_utils.py", line 108, in init self.font_state = FontState(data_dir) File "/home/tian/tensorflow/example/SynthText/text_utils.py", line 421, in init self.char_freq = cp.load(f) TypeError: a bytes-like object is required, not 'str'

    Can you tell me how to solve it ?

    opened by Tian14267 8
  • Can't add a new (my own) font in the script - PLZ IF SOMEONE KNOWS, IT'S URGENTLY FOR ME, Ty

    Can't add a new (my own) font in the script - PLZ IF SOMEONE KNOWS, IT'S URGENTLY FOR ME, Ty

    Hi all and @ankush-me ,

    can someone help me, I don't understand how to add my own font into script? I found in code fontlist.txt text source - a file that is used as text source. So there I put my .ttf fonts that I downloaded from here: https://www.1001fonts.com/text-fonts.html?page=1&fbclid=IwAR395plHPNmZemLoKmvpedbqRv0z8pUVU66np7LLWzia14GrAKWcZB1H5o4 and then terminal shows me and error :

    'Fabiolo' is the name of font

    Capture

    When I use the upper one (fontlist1.txt - mine) there shows an error. When I use the bottom one (fontlist.txt - originally from SynthText script) that works fine everytime.

    Capture1

    Here you can see how I add paths and names into fontlist1.txt I also add the font names like 'Fabiolo', 'Halida Sans', ... into Font Manager but still nothing, shows the same errors.

    Capture2

    opened by Didier0 7
  • Is it a bug ? or just my understanding is wrong ?

    Is it a bug ? or just my understanding is wrong ?

    In synthgen.py filter() method.

    @staticmethod
    def filter(seg,area,label):
        """
        Apply the filter.
        The final list is ranked by area.
        """
        good = label[area > TextRegions.minArea]
        area = area[area > TextRegions.minArea]
        filt,R = [],[]
        for idx,i in enumerate(good):
            mask = seg==i
            xs,ys = np.where(mask)
    
            coords = np.c_[xs,ys].astype('float32')
            rect = cv2.minAreaRect(coords)          
            box = np.array(cv2.cv.BoxPoints(rect))
            h,w,rot = TextRegions.get_hw(box,return_rot=True)
    
            f = (h > TextRegions.minHeight 
                and w > TextRegions.minWidth
                and TextRegions.minAspect < w/h < TextRegions.maxAspect
                and area[idx]/w*h > TextRegions.pArea)
            filt.append(f)
            R.append(rot)
    

    ` when it used cv2.minAreaRect(coords) , I think the coordinates should be the form of (index in width, index in height). But coords is get by using np.where(mask), so the coords actually is (index in height, index in width). So is this a bug ? And I just find the box four points can't encase the segment correctly? Or is it just ok or my understanding wrong ?

    opened by DeepInSearch 7
  • Image '67/fruits_129_18.jpg' does not match its gt?

    Image '67/fruits_129_18.jpg' does not match its gt?

    The index of '67/fruits_129_18.jpg' in the entire dataset is 82567. There are only 3 words in the image but the corresponding wordBB has shape (2,4,16), charBB has shape (2,4,56).

    The md5 value has been checked which confirms the download is right.

    opened by SakuraRiven 7
  • How  to get the SynthText-image result?

    How to get the SynthText-image result?

    hello,please help me,I was so trouble in the run demo. I am running python gen.py --viz I get ./result/SynthText.h5 But I cannot get the SynthText-image result. How to get the SynthText-image result? like this

    opened by teresasun 7
  • Incorrect visualization of bboxes

    Incorrect visualization of bboxes

    I just downloaded this code and run the sample without any problems, however in the visualization popup window I get the following:

    image

    I just run the command python gen.py --viz I'm trying to see if it is just a visualization error but I can't find any information on the annotation labels under ./results/SynthText.h5, only the resulting image itself.

    For example:

    image

    opened by DimTrigkakis 4
  • AssertionError and text placement parameters misunderstanding

    AssertionError and text placement parameters misunderstanding

    Hi @ankush-me , I have this error and don't know what is incorrect? Is the AssertError important? image

    And can you explain what is this text placement parameters:

    self.f_shrink = 0.90
    self.max_shrink_trials = 5 # 0.9^5 ~= 0.6
    self.p_flat = 0.10
    

    because when I change it, nothing is changed with text on image

    opened by Lane689 0
  • Saving masks in folder

    Saving masks in folder

    Hi all and @ankush-me ,

    while generating images I can save images with rendered text in folder but how can I save text masks in some other folder in .png, .jpg or some image format, not binary?

    opened by Lane689 0
  • Alpha blending for better text visibility

    Alpha blending for better text visibility

    Hello @ankush-me,

    In poisson_reconstruct.py is the:

    IM_TOP -> text instance
    IM_BACK -> background image

    There says:

    combine images using poission editing.
    IM_TOP and IM_BACK should be of the same size.
    

    But what images script combines?

    Also in the poisson_reconstruct.py what is these gxs,gys,gxd,gyd? This is X and Y coordinates of something?

    [gxs,gys] = get_grads(ims)
    [gxd,gyd] = get_grads(imd)
    

    Is the alpha blend done here?

    elif mode=='blend': # from recursive call:
               # just do an alpha blend
               gx = gxs+gxd
               gy = gys+gyd
    

    When I replace 'blend' instead 'max' in functionblit_images(), nothing happens?

    opened by Lane689 1
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 4, 2023
A curated list of awesome synthetic data for text location and recognition

awesome-SynthText A curated list of awesome synthetic data for text location and recognition and OCR datasets. Text location SynthText SynthText_Chine

Tianzhong 283 Jan 5, 2023
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 99 Jan 6, 2023
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

basic-dataset-generator-from-image-of-numbers generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv inpu

null 1 Jan 1, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 5, 2023
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Learning to Segment Every Thing This repository contains the code for the following paper: R. Hu, P. Dollár, K. He, T. Darrell, R. Girshick, Learning

Ronghang Hu 417 Oct 3, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

null 123 Dec 25, 2022
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023